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^ 1? 

Gene cluster involved in nogalamycin biosynthesis, and its use in production of 
hybrid antibiotics 

Field of the invention 

This^inventioirrelates^to the ^gene cluster for nogalamycin' biosynthesis derived from 
Streptomyces nogalatery and the use of the genes therein to obtain novel hybrid 
antibiotics rfbr dmfi screening. ' 

Background of the invention 



Anthracyclines are antitumor antibiotics, mainly produced by Streptomyces sp. 
Daunomycin family of anthracyclines is commercially most important, since almost 
all of the around ten anthracyclines currently in clinical use, or in late clinical trials 
for cytotoKiG-^drugs,f belong to this^famiLy.. Despite the^^long^history of anthracyclines, 
three decades oj so, the stjudies on their^ biosynthesis are still going on, and there is 
further^ interest to obtain novel molecules for the development of cancer chemo- 
therapeutics. A method currently^usedrfor^findingi^oy screening 

i&^-penfirir. P .Tipi nPP .T n np^.rinninp. ^Hai^ f o r an t h r a CVall n e/ b incynthpsiS faHHtatf>g J Ji ^ 

produGtion»«©f hybrid^anthracyGlines$**as*AV^ell as their-use 4n combinatorial biosynthesis 
to generate* novel^molejculeys^ 

Nogalamycin, vi^hich was first described by Bhuyan and Dietz in 1965, is an 
anthracycline antibiotic produced by Streptomyces nogalater. It is highly active 
against tumor cells, v^hereas toxic properties of this compound have prevented its 
progress to clinical trials (Bhuyan and Smith, 1975). However, menogaril (7-0- 
methylnogarol) is a semisynthetic derivative of nogalamycin, and its value in the 
treatment of :c^neei^has^^b€^^studiad»^(e%.,i^^ al, 1990^^4hc interest being 

now^mainly^in^^^ (Fig. .1) differS;..frOm- most other 

anthraG^yGlines?»as*e.g.,^firom»^the daunomy^ein-^fa^ ^tswo-^noteworthy features: (i) 
The stereochemistry at position nine is opposite, and (ii) it has a sugar moiety, in 
which nogalamine is attached at position 1 by a typical glycosidic bond, but it is also 
attached to carbon 2 by an extraordinary C-C bond. Structural elucidation of 
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nogalamycin was reported by Wiley et al (1977). Furthermore, biosynthetic studies 
of nogalamycin have been published by Wiley et al in 1978 giving information of 
the building blocks: The aglycone moiety is built from ten acetates; the neutral sugar, 
nogalose, is derived from glucose; and methyl groups of both of the sugars, 
5 nogalamine and nogalose, are transferred from methionine. The origin of nogalamine 
was not clearly solved by Wiley, but most probably nogalamine is also derived from 
glucose. - - - — — — — — " 

Molecular cloning of bio s ynthe s i s genes foi anlluacycliues has facilitated the studies 

10 on molecular genetics, providing tools for rational modifications of the structures, 

- -while also for_surprising combinations jwi_th othej:_an^ the interest has 

focused on daunomycin biosynthesis genes, as reported in several publications 
(Lomovskaya et al,, 1998; Rajgarhia and Strohl, 1997 and references therein). Some 
genes for aclacinomycin biosynthesis from 5. galilaeus (Fujii and Ebizuka, 1997) and 

-IS ^for-rhodomyein-biosynthe^i!SV^ffpm-5r^^ 

cloned as well. We have cloned the biosynthesis genes for nogalamycin, and 
successfully used the genes for producing hybrid anthracyclines. Most of the genes 
are mvolved m polyketide pathway, being responsible tor the formation of a tricyclic 
intermediate, and they are reported in Ylihonko et a/., 1996a and b, and by Torkkell 
20 et aly 1997. Despite the advances in molecular cloning, the biosynthetic pathway 
from glucose to sugars found in anthracyclines is still mainly hypothetical. 

Regarding the genes for deoxyhexose pathway, Madduri et al (1998) have reported 
that a gene derived from avermectin biosynthesis cluster caused the production of 
25 hybrid anthracyclines altering the sugar moiety when transferred into an 5. peucetius 
mutant. The product obtained was epirubicin, a commercially important anthra- 
cycline. hi this case a hydroxy group in the daunosamine moiety was in the opposite 
stereochemistry due to the action of an avermectin biosynthesis gene. S. galilaeus has 
been used as the host to prepare hybrid anthracyclines using the genes derived from 
30 rhodomycin pathway from 5. purpurascens (Niemi et al, 1994), and from nogala- 
mycin biosynthesis cluster from 5, nogalater (Ylihonko et aly 1996a). The genes for 
nogalamycin pathway were used to generate the hybrid anthracycline production in 5. 




steffisburgensis producing typically steffimycin (Kunnari et al, 1997), Previously, 
biosynthesis genes for actinorhodin have been expressed in S, galilaens resulting in 
the formation of aloesaponarin (Strohl et aL, 1991). These hybrid compounds were 
modified in the aglycone. moiety.- 

Summary of the^invention 



The present invention concerns a gene cluster of Streptomyces nogalatery most of the 

genes of which are derived from the deoxyhexose pathway for nogalamine and 

nogalose. Expressing a DNA fragment of the said region in S. galilaeus, which 
produces aclacinomycins, hybrid anthracyclines are obtained, in which the aglycone 
moiety is derived from S, galilaeus, whereas the sugar moiety is characteristic neither 
to S. nogalater nor to S. galilaeus. Furthermore, when inserting the gene included in 
said cluster, encodmg a cyclase for nogalamycin, into a suitable* plasmid construction, 
-nogalamycinone4s-obtained^whichJs-the^glyconc_o^^^ 
chemistry of nogalamycin differs from most other anthracyclines, using this gene 
enables* the preparation ^of C-9 stereoisomers of the anthracycline molecules. 

Detailed deseription of the invention 

The experimental procedures of the present invention are methods conventional in the 
art. The techniques not described in detail here are given in the manuals by Hopwood 
et al "Genetic manipulation of Streptomyces: a laboratory manual" The John Innes 
Foundation, Norwich (1985) and by Sambrook et al (1989) "Molecular cloning: a 
laboratory manual". The publications, patents and patent applications cited herein are 
given in the reference list in their entirety. 

The present invention concerns particularly the gene^eluster for ^nogalamycin 
biosynthesis (5no5-cluster) causing the pfodu'ction^'Of hybrid- antibiotics with 
modifications in the sugar moiety. The invention concerns in specific the use of the 
genes for nogalamine/nogalose biosynthesis to generate hybrid antibiotics modified in 
sugar moieties. The invention also concerns the use of a specific cyclase gene 
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included in the gene cluster of the invention, to generate the C-9 stereoisomers of 
typical anthracyclines. 

The gene cluster according to the present invention is linked to the earlier reported 
5 clusters for nogalamycin biosynthesis. The starting point of the present invention was 

the gene cluster for nogalamycin chromophor (International Patent Application WO 
06/105 8 1), Subscquqntly, we have found some. genes for the dcoxyhcxosc pa t hway of 

nogalamycin biosynthesis (Torkkell et aly 1997), and a part of the fragment 

comprising said genes was used to clone the genes for this invention. 

10 

Thejbi^synthesis genes for nogalamycin can be isolated from Streptomyces sp., 

particularly from S. nogalater, which produces nogalamycin. Species which produce 
nogalamycin-like anthracyclines can also be used, e.g. 5. violaceochromogenes 
producing arugomycin (Kawai et aL, 1987), or S, avidinii producing avidinorubicin 

4.5 (Aoki-e/La/.,-l991) ^''^i '-'r^ ■ — 



Genomic DNA of a Streptomyces strain carrying the genes for nogalamycin 
"tJiosynthesisns used in preparing a genomic library. Suitable gene fragments fST 



cloning may be obtained by any frequently digesting restriction enzyme. Typically 
20 SauSAd is used. The isolated fragments could be inserted by ligation in any 

Escherichia coli vector such as a plasmid, a phagemid, a phage, or a cosmid. A 
cosmid vector is preferred since it enables the cloning of large DNA fragments. A 
cosmid vector such as pFD666 (ATCC No. 77286) is suitable for this purpose, as it 
enables cloning of the fragments of about 40 kb. The BamHl site of pFD666, giving 
25 sticky ends to the 5aw3AI fragments may be used for cloning. Commercially 

available kits may be used to pack the DNA in phage particles. Various J5. coli 
strains can be used for the infection by the DNA packed. An appropriate E. coli 
strain is, e.g. XLlBlue MRF', which is deficient in several restriction systems. 



30 



Using E. coli as a host strain for the genomic library, hybridization is an 
advantageous screening strategy. The probe for hybridization may be any known 
fragment derived from the nogalamycin gene cluster, but a short fragment of about 1 



V 



5 

kb derived from one end of the biosynthetic region previously cloned is preferred. 
Colonies for the genomic library are transferred for filter hybridization to 
membranes, preferably to nylon membranes. Since the average size for a genomic 
DNA fragment is 40 kb, 2300 colonies gave 99,99% probability to find the expanded 
region for nogalamycin biosynthesis. Any method for hybridization may be used but, 
in particular, the DIG System (Boehringer Mannheim, GmbH, Germany) is useful. 
Since the pr n he is homolnponv tn thp hj^hridiV^H nMA u |c pT^:>frr3t ^i^ |^ rn r r^- o ut 
the stringent washes of hybridization at 70^*0 in a low salt concentration according to 
Roe.hrinp . rT Mannhftim' s manual " DTG S y s t p m Tkpr*g Gnidp for vutex UyhridWz,- 
tion". At least 80% homology is suggested to be needed for a DNA fragment to bind 
a probe in the conditions used for washes. 

Using this protocol, seven clones out of about 5000 gave positive signals, and were 
picked-up-^for -DNA^ isolation-^R^triction^appmg .is -an»^appropidate^ technique for 
ch^actenzing^ the clones. clones ma y be dig ested with conven i ent 

restriction enzymes to demonstrate, the physical linkage map .of the^DNA fragments. 
The cosmid^.used%for^cloning%,was.a shuttle^cosnidd^replieatiiig^in .both co/r and 

^tr^tmme^ i.p. IIuvvlvli, Ihu Ua u sfci of the iLLumbiiioxit cuMuidu Jii-5^-fm^ 

TK24,-^whieh^is a typiGally%sed^ laboratory strain 4n elom^^ resulted in 

deletions, and-was omitted. Instead, we rather used in the expression studies the 
plasmid pU486, a high copy number Streptomyces plasmid. However, any plasmid 
being able to stably replicate in Streptomyces may be used for this purpose. 

Two Bglll fragments of one of the clones were separately inserted into pU486 
vectors, and the two plasmids obtained were transferred into a primary host, 5, 
lividans TK24. The recombinant plasmids obtained (pSY42 and pSY43), containing a 
10 kb andwa 7kb^fragnient from^S'. «<7gfl/flf6r genomic ^^DNA^. respejEtively; were 
isolated^from 4heiprimaiyrhost and^-^further^in^^ 

by protopkst^^transfol^atdonN^^^^T^ kb fragment 

caused the production of hybrid anthracyclincs in the S. galilaeus mutant strain 
H039, which endogenously produces aklavinone-rhodinose-rhodinose-rhodinose. A 
few other 5. galilaeus strains (H075, H026, H063) mutated in deoxyhexose pathway 
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for sugars in aclacinomycin were used in transformation, and new hybrid compounds 
were obtained. Since the structure of nogalamycin is almost unique among 
anthracyclines, the plasmids could be transferred to other anthracycline-producing 
strains, such as S. peucetius, which produces daunomycin, and S. purpurascens, 
5 which produces rhodomycins, to modify the structures of the characteristic 
antibiotics. 



As the cloned cluster was linked to nogalamycin biosynthesis region already known, 

its ability to generate the modification in sugar moiety suggested the presence of the 

10 genes for deoxyhexose pathway. However, sequencing is necessary to deduce the 
_ ^^ctjon of the genes in the cluster cloned. The DNA fragments of 10 kb and 7 kb 
were further inserted into the plasmid pSL1190 for subcloning. Sequencing strategies 
such as a deletion set of the DNA fragments, shotgun cloning or primer walking 
could be used, but we prefer to use restriction fragments for subcloning. Using ABI 

15 PRISM_system_(P.erkin=Elmer)xfor-sequencingjt-is-possible-tO-g^ 

per one reaction, which means that about 1 kb fragments sharing overlapping bases 
are needed for sequencing. For this purpose, 27 subclones were constructed. 



Sequencing of the flanked BglW fragments consisting of about 16000 bp revealed 15 
complete ORFs. The sequence analysis can be made by any computer based program, 
such as GCG (Madison, Wisconsin, USA) package. According to the present 
invention the putative gene functions as deduced from the sequence homology of 
those available in the libraries are 

aminotransferase (5nogI), not completed 

1. dTDP-glucose synthase (snogf) 

2. aminomethyl transferase (ynogA) 

3. polyketide cyclase, (5/ioaM) 

4. a gene of deoxyhexose pathway, unknown (5rtogN) 

5. hydroxylase, {snodG) 

6. dTDP-4-dehydrorhamnose reductase (snogC) 

7. dTDP-glucose 4,6-dehydratase (snogK) 

8. NAME cyclase (5noaL) 
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25 



30 
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9. unknown (snoK) 

10. glycosyl transferase, GTF (snogD) 

11. unknown (snoW) 

12. glycosyl transferase, GTF (snogE), 

13. unknown (snoIS) 

14. unknown (snoO) 

Vptf>r>-rliir>tnc>. (vrrr^F) 



unknown (snoK), not completed 



10 Gene designations: g means that the gene involved in biosynthesis of the glycosidic 
proportion including glycosyl transferases, whereas a points out that the gene is 
needed for the formation of the aglycone moiety. 

Considering the proposed biosynthetic pathway for nogalamycin shown in Fig 3. we 
_15 ^e_ableJo_causeseveraUdi^^ 



identified, including ^«oaL, responsible for the cyclization- of the fourth ring of the 
aglycone moiety while determining the stereochemistry of the anthracyclinone, and 
-thc-gcMcs--ttf£eetiiig4h c-ft)wialiof^ i iog alaiii mti^nd-iiegttloiic (y/togJ , A«ogK, snogJ N , 



5nogC, irtogA), and, in addition, the genes responsible for joining the sugar residues 
20 to the aglycone moiety (snogD and j/iogE). 

These genes could be separately inserted in a vector using suitable restriction sites, or 
by amplifying the genes by PGR. The fragments may contain an intrinsic promoter, 
or a promoter may be separately cloned. It is advantageous to use a vector carrying a 

25 promoter to allow expression of the genes in a Streptomyces strain. The plasmid 
pIJE486 contains a promoter erniE for erythromycin resistance gene, allowing 
constitutive expression of the genes inserted in a correct orientation. Special attention 
is drawn to the gene eneoding-a cyclase for.the ^aliphatic ringi but-any gene of said 
cluster may^%e expressed' in StreptomyGes' hosts. The said«cycIaser.converts the 

30 stereochemistry at C9 of auramycinone in TK24, if inserted into the plasmid 
possessing the other genes for auramycinone biosynthesis, except the cyclase 
responsible for the typical stereochemistry of anthracyclines. 
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Streptomyces strains, in particular S, galilaeuSy carrying the recombinant plasmids are 
cultivated in media wherein antibiotics are produced. The hybrid compounds are 
extracted with organic solvents from the culture broth, and the compounds are 
separated and purified using chromatographic techniques. 

According to this invention 5. galilaeus H039 carrying the plasmid pSY42 and 
designated as H039/pSY42 produces aklavinQne.-4*-e.pi-2-de.03cyfucosc in El 



medium supplemented with thiostrepton to give selection pressure for the plasmid 
containing strains. 
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S, lividans TK24 carrying the plasmid pSY15c containing the genes for the 
nogalamycin chromophor and the genes for a cyclase (snoaL) and a ketoreductase 
(5/xoaF), was cultivated in El medium supplemented with thiostrepton. The 
compound 9-epi-auramycinone was produced, and this structure is now called 

15 no g alamycinone.- Any DNA f rag ment of the invention subcloned from a 1 7 kb 

nogalamycin biosynthesis region can be inserted in a vector replicating in 
Streptomyces y and the products may be produced by fermentation of the plasmid 
cont aini ng strains. ^ 

20 Brief description of the drawings 

Fig. 1 shows the structures of nogalamycin, daunomycin and aclacinomycin. 

Fig. 2 is a diagram of the gene cluster (Sno5) of the invention for nogalamycin 
25 biosynthesis. 

Fig, 3 describes the proposed biosynthesis pathway for nogalamycin. 

Fig. 4 shows a diagram of the plasmid pSYlSc. The genes snosiL (aL) and snoaF 
30 (aF) shown black are inserted in the plasmid pSY15 to give pSY15c. aL 

represents a cyclase snoaL and aF is for C-7 ketoreductase snoaF. pSY15 
(WO 96/10581) generates the production of a tricyclic intermediate for 
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nogalamycin biosynthesis in 5. lividans. The abbreviations al, a2 and a3 
refer to the genes snoal, snod2 and snoai, respectively, for minimal PKS. 
rA is the snorA gene for an activator, aB is the snoaB gene for oxygenase, 
aC is the snoaC gene for methylase, aD is the snoaD gene for polyketide 
ketoreductase and aE is the^/ioaE for aromatase. gF (the*iy/iogF gene) and 
gG (the 5nogG gene) involved in the deoxyhexose pathway are not 
functional in the construct, oph is an aminoglycoside phosphotran s ferase — 
gene, and tsr is a thiostreptone resistance gene. 



Examples to further illustrate the invention are given hereafter. 

EXPERIMENTAL 

Materials used 

Restriction enzymes_used;were4>^^^ 

or Boehringer Mannheim (Germany), and alkaline phosphatase from Boehringer 
Mannheim, and used according to the manufacturers' instructions. Proteinase K was 
purchasedHErom-iiTni ieg a a nd lyso z yme f i o n i S igm a (St. Louis, USA). Hybond^^N" 
nylon membranes used in hybridization were purchased from Amersham 
(Buckinghamshire, England), DIG DNA Labelling Kit and DIG Luminescent 
Detection Kit from Boehringer Mannheim. Qiaquick Gel Extraction Kit from Qiagen 
(Hilden, Germany) was used for isolating DNA from agarose. 

Bacterial strains and their use 

Escherichia coli XLl Blue MRF* was used for cloning. 

Streptomyces nogalater ATCC 27451; the gene cluster of nogalamycin biosynthesis 
was cloned from this strain. 

The host strains to express the genes cloned were: 

Streptomyces lividans TK24, also used as a primary host to clone DNA propagated in 
E, coli. 

Streptomyces galilaeus H039, produces aklavinone-rhodinose-rhodinose-rhodinose 
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Streptomyces galilaens H026, produces aclacinomycin N, ACMN, (aklavinone- 

rhodosamine-2-deoxyfucx)se-rhodinose) 

Streptomyces galilaeus H063, produces aklavinone 

Streptomyces galilaeus H075, produces aklavinone-rhodosamine-2-deoxyfucose-2- 
5 deoxyfucose 

— — Hie detailed descri p tion of the mutants ,11039. and .11026 i s given_in Ylihonko al 

(1994) and of H075 in the FI patent application No. 981062 (Ylihonko et al.y 1998). 

1 1063 has not been described in the literature but it was obtained by NTG 

10 mutagenesis of S. galilaeus, and selected to be used as the host strain in the hybrid 
compound.praduction,. asjt accmn^ without any sugar residues. 

Plasmids 

E. coli - Streptomyces shuttle cosmid pFD666 (ATCC 77286) was used for cloning 

the Ghromosorrial-DNA^^^^co/z^cloni^^^^ 

were used for preparing the subclones. 

pIJ486 is a high copy plasmid vector provided by prof. Sir David HopwoodTJohn ~ 
Innes Centre, UK (Ward et al, 1986) 

20 

J pUE486 is a vector containing ermE gene in the polylinker of pU486. (Bibb et aL, 

I 1985). 

» ■» ' 

' . pSY15 is a pU486 based plasmid construct, wherein the genes of polyketide pathway 

25 for nogalamycin biosynthesis were cloned (Ylihonko et aL, 1996a). 

» * 

I t Nutrient media and solutions 

For cultivation of 5. nogalater for total DNA isolation TSB medium was used. 
Lysozyme solution (0.3 M sucrose, 25 mM Tris, pH 8 and 25 mM EDTA pH 8) was 
30 used in isolation of total DNA. TE buffer (10 mM Tris, pH 8.0 and ImM EDTA) 
/. was used to dissolve the DNA. 
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TRYPTONE-SOYA BROTH (TSB) 

Per litre: Oxoid Tryptone Soya Broth powder 30 g. 

ISP4 

BaGto ISP-medium 4, Difco^ 37 g/1. 

El Per litre in tap^^water; — ~. . 

glucose 20 g 

soluble starch 20 g 

Farmaraedia 5 — g 

yeast extract 2.5 g 
K2HP04#3H20 1.3 g 
MgSO4#7H20 1 g 
NaCl 3 g 

CaC03 3 g 

pHs^adjusted to 7.4 before autoclaving .^^ 



GeneFal^methods^^ 

NMR data^as^coUected^with^a JEOL JNI^GX 4O0?^^peetaronieter^ at the ambient 
temperature JH and^C NMR samples were intemally^refereneed^to TMS. 

The anthracycline metabolites were detected by HPLC (LaChrom, Merck Hitachi, 
pump L-7100, detector L-7400 and integrator D-7500) using a LiChroCART RP-18 
column (4.6x250mm). Acetonitrile:potassium hydrogen phosphate buffer (60 mM, pH 
3.0 adjusted with citric acid) was used as the mobile phase. Gradient system starting 
from 65% to 30% of potassium dihydrogen phosphate buffer was used to separate 
the compounds. The flow rate was 1 ml/min and the detection was effected at 430 
nm. 

ISP4*^plates supplemented»wit'ht thiostrept0n«i(5O Avere^ usedv to maintain the 
plasmid carrying eultures^^ 
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Example 1* Cloning the gene cluster for nogalamycin biosynthesis 
U Cosmid library 

For the isolation of total DNA, Streptomyces nogalater (ATCC 27451) was grown 
5 for three days in 50 ml of TSB medium supplemented with 0.5% of glycine. The 

cells were harvested by centrifuging for 15 min at 3900 x g in 12 ml Falcon tubes, 
and the cells weic s to red at -20^C. Cells fr o m a 12 nil sample of the cultuie were 

used to isolate the DNA. 5 ml of lysozyme solution containing 5 mg of lysozyme/ml 

was added onto the cells, incubated for 20 min at 2TC. 500 fx\ of 10% SDS 

10 containing 0.7 mg of proteinase K was added onto the cells and incubated for 80 min 
at 62°C, another 500 jwl of 10% SDS containing 0.7 mg of proteinase K was added, 

and incubation was continued for 60 min. The sample was chilled on ice and 600 /A 

of 3M NaAc, pH 5.8 were added, and the mixture was extracted with equilibrated 

phenol (Sigma). The phases were separated by centrifuging at 1400 x g for 10 min. 
1^ The-DNA-was^precipitated-from:^the-v/^^ 

spool with a glass rod, and washed by dipping to 70% ethanol, air dried and 

dissolved in 500 fA of TE-buffer. 



The chromosomal DNA was partially digested with 5au3AI. The DNA fragments 
20 were separated by agarose gel electrophoresis, and the fragments of 30 to 50 kb were 
cut from the 0.3% low gelling temperature SeaPlaque® agarose. The DNA bands 
were isolated from the gel by heating to 65**C, extracting with equal volume of 
equilibrated phenol, and the phases were separated by centrifuging for 15 min at 
2500 X g. The phenol phase was extracted with TE buffer, centrifuged and the water 
25 phases were pooled. The DNA was precipitated by adding 0.1 volumes of NaAc, pH 
5,8 and 2 volumes of ethanol at -20°C for 30 min, centrifuged for 30 min at 15 000 
rpm in Sorvall RC5C centrifuge using SS-34 rotor with adapters for 10 ml tubes. 
The pellet was air dried and dissolved in 20 fA of TE buffer. The isolated fragments 
were ligated to pFD666 cosmid vector digested with BamWl and dephosphorylated. 
30 The DNA was packed to phage particles, and infected to E. coli using Gigapack® III 
XL Packing Extract Kit according to the manufacturer's instructions. 
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1.2 Identification of the clones by hybridization 

The infected cells were grown on LB plates containing 50 /^g/mI kanamycin and 
transferred to Hybond™-N nylon membranes (Amersham). The membranes were 
handled according to the protocol described in Boehringer Mannheim's manual "The 
DIG System User's Guide for Filter Hybridization". The probe^used to screen the 
colonies for an expanded nogalamycin gene cluster was a 1.07 kb 5acl fragment 

from the cluster doGcribed carlior (Torldcell ct al, 1997). The plasmid carrying the 

probe was digested with Sad, and the fragment was separated from the vector by 

agarose gel electrophoresis and isolated from the gel using Qiaquick Gel Extraction 

Kit (Qiagen). The probe was labelled by digoxygenin using random prime labelling 
system according to Boehringer Mannheim's manual "The DIG System User's Guide 
for Filter Hybridization". 5000 colonies were screened by hybridization at 70°C using 
the probe described. Positive colonies were detected using DIG Luminescent 
Detection Kit (Boehringer Mannheim). Seven colonies gave a positive signal. 
Cosmids_from the^positiy-e^clone$iwere-isolated_from--a 5mLcuhure-by^^ 
method. Restriction analysis showed that the cloned fragments^overlapped each other 
representing at least 60 kb of the continuous* DNA. The positive clones obtained were 
"designat ed as pF lJ.&KHrtcrp FD>SW6 > 7. 

1-3. Subcloning the fragments for sequencing 

Clone No. 5, designated as pFD5no5, was digested with Bglll, and for subcloning 
two fragments of about 10 kb and 7 kb were isolated and ligated to pSL1190 
digested with Bglll and dephosphorylated. The plasmids obtained were named as 
pSn42 and pSn43, respectively. These two fragments cover the DNA region flanked 
to the previously characterized area of nogalamycin biosynthesis cluster. To 
determine the nucleotide sequence of the whole 17 kb region cloned in pSn42 and 
pSn43 the convenient restriction sites were used to subclone the fragments to the 
vector pUC19 or pSL1190 giving 16 subclonesrfrom the insert of .pSn42 and 11 
subclones of pSn43t' 

E, coli XLl Blue MRF* cells were cultivated overnight at 37 °C in 5 ml of LB- 
medium supplemented with 50 M-g/ml of ampicillin. To isolate plasmids for 
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sequencing reactions Wizard Plus Minipreps DNA Purification System kit of 
Promega, or Biometra silica spin plasmid miniprep kit of Biomedizinische Analytik 
Gmbh were used according to the manufacturers' instructions. 

5 DNA sequencing was performed using the automatic ABI DNA sequenator (Perkin- 
Elmer) according to the manufacturer's instructions. 



1.4 Sequence analysis and the deduced functions of the genes 

Sequence , analy s es were e ffect ed using the GCG sequence analy s i s s oftware package 

10 (Version 8; Genetics Computer Group, Madison, Wisconsin, USA). The translation 
table was modified to accept also GTG as a start codon. Codon usage was analysed 
using published data (Wright and Bibb 1992). ~ 

According to the CODONPREFERENCE program the sequenced DNA fragment 

15 contained 15 com plete open reading frames (ORFs), and the 5V end of other.two 

ORFs in the both ends of the fragment according to the invention. The functions of 
the genes were concluded by comparing the amino acid sequences translated from 

are shown in Table 1. The positions given refer to the appended sequence listing. 
20 The amino acid sequences of the peptides are given in SEQ ID NO:2 to SEQ ID 
NO:18. 
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Table 1 



• 


Gene 


Position 


Amino acids 
(SEQ ID NO) 


Deduced function 


Remarks 




snogl 


-1027 
compl 


>342 ^(2) 


amiifbtlransf erase ^ 


5' end 


snogJ 


1192-2073 


293 (3) 


dTDP-glucose synthase^. 




snogA 


2106 2822 


7-^8 (4) 


^aminnmp.thyl. Irancff^rac^*^.. 








compl 










snoaM 


2826-3800 


324 (5) 


a polyketide cyclase 








compl 










snogN 


3799-5025 


408 (6) 


dnrQ homology (Otten et 
al, 1995), unknown 




snoaG 


5088-6356 


422 (7) 


hydroxylase 




snogC 


6334-7209 
compl 


291 (8) 


dTDP-4-dehydrorhanmose 
reductase 




snogK ~ . 


7245-8297 
-compl :^ 


350 (9) 


dTDP-glucose^4,6- 
-dehydratase 






snoaL 


8537-8941 


134 (10) 


NAME cyclase (nogalonie 
acid methyls ester) , 








8Q92-9699 


^-^s^ n n 


unknnwn 




* 1 > 

♦ .» > J 

« « > 

■> 

* ■> 

#'•.'1 

> > 

• > > 

» > 

» 

• » -» 

• • > 
* 

• • • 

• • 

• • 

• • • 

• 

• • • * • 

• • 




Q74S— 1 nQ'1*7 

compile 




glyeosyl transferase*^ 






sno^ 


11057- 
11884 


275 (13) 


til J i \J W 1 J 




snogE 


11928-* 


>424 (14) 


glycosyl transferase 




snol. 


13335- 

13754 

compl 


139 (15) 


unknown 




snoO 


13974- 
14441 


155 (16) 


homologous to mtmX of 
mithramycin cluster 




snoa¥^ 


14532% 
15377 


281*^ (17) 


C-7 ketoreduetase*:*' 
analogous4o akliaviketpne^v 
ketoreduGtaseiKt 




snoW^ 






unkno^niifc 


5' end 



*, nucleotide sequence of about 100 bp, not known 



• * 

* • « 

* > J 
> > 
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1.5 Expression cloning 

The 10 kb Bglll fragment from pFDSnoS was cloned into the plasmid pU486 and the 
plasmid obtained was named as pSY42. Correspondingly, the 7 kb Bglll fragment 
from pFDSnoS was cloned into the plasmid pUE486, and the plasmid pSY43 was 
5 obtained. Plasmid pSY42 was introduced into 5. lividans strain TK24 by protoplast 
transformation, isolated from it and further introduced into S. galilaeus mutant H039, 

and after propagation in H039> transferred to other S. galilaeus mutants blocked in 

the deoxyhexose pathway for characteristic sugars of aclacinomycins (H075, H026, 

and H063). El medium was used for anthracycline production, and the products were 

10 extracted from the culture with tolueneimethanol (1:1) at pH 7. Anthracycline 

metabolites were analyzed by HPLC. The products of the mutants H039, H026, H063 
and H075 carrying pSY42 differed from those obtained by the mutanTs without ^e 
plasmid. 

15 According to the sequence analysis pSY42 contained a cyclase designated a s ____ 
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NAMEC (nogalonic acid methyl ester cyclase), and in pSY43 a ketoreductase gene 
was identified. Expression constructions were prepared which contained all the genes 
- necded - for the formatio n of nogalamycin agly-cone^A 1. 4 kb . B^/nHI^ t SacL-fragment 



from pSY42 (containing NAMEC) and a 1.1 kb Mlul-Kpnl fragment from pSY43 
20 carrying the gene for a ketoreductase of C-7 keto group were ligated to pSY15 
linearized by Sad, to form the plasmid pSY15c (Fig. 4). Plasmid pSY15c was 
introduced into 5. lividans TK24, and the strain TK24/pSY15c was cultivated in El 
medium supplemented with thiostrepton. An aglycone compound was produced, and 
this structure is now called nogalamycinone. 



Example 2. Compounds generated by the 5/io5-cIuster 



2,1 Production and purification of the products derived from H039/pSY42 
and TK24/pSY15c 

30 The seed culture, 180 ml of El culture of the plasmid containing strain, H039/pSY42 
or TK24/pSY15c, was obtained by cultivating the strain in three 250 ml Erlenmeyer 
flasks containing 60 ml of El medium supplemented with thiostrepton (5 fx^mX) for 
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four days at 30°C, 330 rpm. The combined culture broths (180 ml) were used to 
inoculate 13 1 of El medium in a feimentor (Biostat E). Fermentation was carried out 
for seven days at 28*'C (330 rpm, aeration: 450 1/min). 

The cells were hapvested by centrifuging. 2.6 I of methanol, was. used. to break the 
bacterial cells and to extract anthracycline metabolites accumulatedi*aiie % 
duimdb^,bl hic an^;.ldb u lit<^s^av^ce<^xt^act&d^^ dichloryOmcthtmc at pll 6. Tlic 

organic layer was evaporated to dryness. The viscous residue was flashed through a 
polyamidc (11) colu u m using watcrimcthanol from 1:9 to 0:10 as the clucnt. Pooled 
fractions containing the compounds were further purified on a Merck-Hitachi HPLC 
using preparative reversed phase column (LichroCART RP-18, 5 jum) with mobile 
phase acetonitrile:! % AcOH in water (1:1). Evaporation of acetonitrile gave pure 
products as yellow powders dried under vacuum. 

Sttuctural-elueidaftion 0!f-4he-compounds-deriyed-from^H03S!/pSY 
frQm=4rm4/pSY4;Sc«4r 

NMR analysis»included»]JJ©N^BM(SftN0E?^,DEF¥^^^ Protons 
were assignaled'^usmg'iJ^IOjaS^F^gn ^U pl i JL ^ y tLUmiques:;aufel».bdi Uojis usmg UfcPl 
and HMiB®>iteehiiiques«# 

As deduced from the data given in Tables 2 and 3, the structures revealed were 
aklavinone-4'-epi-2-deoxyfucose from the culture of H039/pSY42, and 9-epi- 
auramycinone (=nogalamycinone) from the culture of TK24/pSY15c. The chemical 
structures of the compounds are shovra below in Formula I and Formula 11, 
respectively. 




OH 



18 



O 



COOCH3 



5 




OH 



(II) 



OH 



o 



OH 



OH 
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Deposited micro organ ism s 

The following microorganisms were deposited according to the Budapest Treaty at 
Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (DSMZ), 
15 ^Mascheroder-Weg-lb,-D=38124-Braunschweig,-Germany 



Microorganism 



Accession number Date of deposit 



S. lividans TK24/pSY42 
20 carrying the plasmid pSY42 



DSM 12451 



14 October 1998 




S. lividans TK24/pSY43 
carrying the plasmid pSY43 



DSM 12452 



14 October 1998 
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Table 2. *H and assignations of the compound aklavinone-4*-epi-2- 
deoxyfucose (Formula I). 



Site ' "C 



1 7-74, IH, dd, 7.5, 1.3 120.1 
2 7.68,^4H, dd,..8.4, 7.5 137.3 

: 3 7.27, IH, dd, 8.3, 1;3 ~~ 124.6 

4 - 161.9 
4-OH 11.70, IH, s 

4^ = n5X^ 

5 - 192.3 
5a - 114.4 

6 - 162.4 
6-OH 12.46, IH, s 

6a - 130.9 

7 5.18, IH, dd, 4.3, 3.1 71.3 
8A 2.51, lH, dd, 15.0, 4.3 33.9 
8B - .2.32, IH, dd, 15.0, 3.1 

-9- z^—' : 72:1^ 

9-OH 4.58, lH, s 

10 4.02, lH, s. I 56.9 
10a - 142.4 

11 7.40, IHrs 120:8 ^ 
11a - 133.1 

. 12 - 180.7 

:i2a - 132.6 

;'i';13A 1.73, IH, dq, 14.2, 7.4 32.0 

: . 13B 1.51, IH, dq, 14.2, 7.4 

•^''il4 1.10, 3H, t, 7.4 6.7 

l\\l5 - 171.1 

,•.'■16 3.69, 3H,s 52.5 

'\''r 5.41, IH, d, 3.5 101.7 

• '''X''2'3i 1.75, IH, ddd, 12.8, 11.2, 3.4 37.7 

2 e 2.19, IH, dd, 12.8, 5.3 

. , , 3' 3.71, IH, ddd, 12.0, 9.0, 5.3 69.0 

^ 4' 3.14, IH, dd, 9.1, 9.0 78.1 

5';': 5' 3.88, IH, dq, 9.1, 6.2 68.8 

6' 1.36, 3H, d, 6.2 17.6 
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Table 3. and "C assignations of 9-epi-auraniycinone (Formula H). 



Site W 



TJ7 



1 7.76, IH, dd, IS, 1.2 119.8 

2 7.67, IH, dd, 8.3, 7, 5 137.4 
-3 — — : ■ 7.28, III, -Jd, 8.3, 1.2 124.0 



4 - 162.5 

4-OH 11.86, IH, s 

-4a 115^6- 



5 - 192.7 

5a - 114.6 

_6 - 160,9 

6-OH 12.76, IH, s 

6a - 134.1 

7 5.40, IH, t, 7.0 64.0 

8A 2.66, IH, dd, 13.9, 7.0 40.9 

8B < 1.89, IH, dd, 13.9, 7.1 

_._9 ^—-1 ^ . 70.5— 

9-OH 3.49, IH, bis - - 

10 3.93, IH, d, 0.8 56.0 
10a - 142.1 

11 7.5 1 , I H, d, 0.8 120.1 



11a - 133.3 

12 - 180.9 

; 12a - 132.1 

; 13 1.44, 3H, s 28.7 

. 14 - 173.0 

i 15 3.90, 3H, s 52.6 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: Galilaeus Oy 

(B) STREET: Kairiskulmantie 10 

(C) CITY: Piispanristi 

(E) COXm'RY: Finland 

(F) POSTAL CODE (ZIP) : FIN-20760 

(ii) TITLE OF INVENTION: Gene cluster involved in nogalamycin 

biosynthesis; and its use in production of hybrid 
antibiotics 

(iii) NUMBER OF SEQUENCES: 18 



(iv) COMPOTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

iC) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.25 (EPO) 



(2) INFORMATION FOR SEQ ID NO : 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 02 0 base pairs 

{ B ) TYPE.:^nucle i c_:ac id_^J^ 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear' 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: = 

(B) STRAIN: Streptomyces nogalater ATCC 2 74 51 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: complement (1..1027) 

(D) OTHER INFORMATION: /function= "aminotransferase" 
/gene= "snogi" 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1192.. 2073 

(D) OTHER INFORMATION: /function= "dTDP-glucose synthase" 
/gene= "snog J" 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: complement (2106.. 2822) 

(D) OTHER INFORMATION: /function= "aminomethyl transferase" 
/gene= "snogA" 

(ix) FEATXJRE: 

(A) NAME/KEY: CDS 

(B) LOCATION: complement (2826.. 3800) 

(D) OTHER INFORMATION: /function= "polyketide cyclase" 
/gene= "snoaM" 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 3799.. 5025 

(D) OTHER INFORMATION: /function^ "unknown" 
/gene= "snogN" 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 5088.. 6356 

(D) OTHER INFORMATION: /function= "hydroxylase" 
/gene= "snoaG" 

(ix) FEATURE: 

(A) VNAMEf/lKEY: CDS-^- 

( B ) LOCATION : compl ement (6334. .7209) 

(D) OTHER INFORMATION: /function= " dTDP-4 -dehydrorhamnoses** 
reductase" 
/gene= "snogC" 



(ix) FEATXJRE: ~ ' 

(A) NAME/KEY: CDS 

(B) LOCATION: complement (7245.. 8297) 

(D) OTHER INFORMATION: /function^ " dTDP-cTlucose-4 , 6 -dehydratase " 

/gene= " snogK " 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 8537.. 8941 

(D) OTHER INFORMATION: /function^ "NAME cyclase" 
/gene= " snoaL" 

(ix) FEATURE: 

(A) > NAME/KEY : CDS . 

(B) ^ LOCA^^ON^^^ 8992 . . 969^ 

(D) OTHERS-INFORMATION: /function^ "unknown" 
— / -gene =—^''SnoWk\ — 

(ix) FEATUREl:^^ 

(a)4^.nameMkey> ' com. 

(B) V LOGATIGN:^' compl^ment'^(97'4'5 . .10917.^^ 

(D)*?^ OTHERMENFORMATION^' /function^ "glycosyl transf erase" 

/gene=- "siJogD" = 

(ix) FEA?EUREl^-f 

(A) NAIilE^KEY- CDS 

(B) LOCATION: 11057.. 11884 

(D) OTHER INFORMATION: /function^ "unknown" 
/gene- "snoW" 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 11928.. 13200 

(D) OTHER INFORMATION: /function= "glycosyl transferase" 
/gene= " snogE " 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: complement (13335 .. 13754 ) 

(D) OTHER INFORMATION: /function^ "unknown !' 

/gene's't'^" snoL'*' ^ 

(ix) FEATURE^:^ 

( A) ;^NAMEV*KEti!(^^ CD3. 

(B) ;^ LO.GAM<©N(5ii.l3 9*^/4^rt 14-4r4iQ.v? 

{ D ) - OTHER«*I«NE®RM^ f tincl5a'on=ivfc.4J homol ogoy s^Aito^^ m tinX ' o f mi t hr amyc in 

cluster" 
/gene= "snoO" 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 14532.. 15377 

(D) OTHER INFORMATION: /function^ "C-7 ketoreductase " 
/gene= "snoaF" 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 15450.. 16020 

(D) OTHER INFORMATION: /fiinction= "unknown" 
/gene= "snoN" 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 3799.. 3800 

(D) OTHER INFORMATION: /note= "overlapping sequence in the 
genes snoaM and snogN" 

(ix) FEATURE: - 

(A) " NAME/KEY: misc-feature " 

(B) LOCATION: 6334.. 6356 

(D) OTHER INFORMATION: /note= "overlapping sequence in the 
genes snoaG and snogC" 



FEATURE : 

(A) NAME/KEY: Tnisc_f eature 

(B) LOCATION: 13201.. 13300 

_ J^) OTH ER INFORMATION : /nqte= "imknown_region" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 





AGATCTCGTC 


CGCCAGTGCC 


TCGGTGACCG 


GCAACGAGCC 


CTTGGCGTAG 


CCGAGATGGG 


60 




AGAAACCGGT 


CATGGTGTGC 


ACGGGCCAGG 


GATAACTGAT 


GTTGAGGGCG 


ATGTCGTAGG 


120 




AGGCGCGCAG 


GGCCTCCAGC 


ACCGCGTCCC 


GTCGCGGATG 


GCGCACCACG 


TACACGTAGT 


180 




AGACGTGCTC 


GTTGCCCTGC 


GCGGTCCTCG 


GCAGCAGCAG 


CCCCGTGTCC 


GCCAGGCCCT 


240 




CCTCATAGCG 


GCGTGCCACC 


GCCCGGCGGG 


CCTCGATGTA 


GGACGGCAAC 


CGGGACAGCT 


300 




TGCGCCGCAG 


GATCTCTGCC 


TGTACTTCGT 


CCAGCCGGCT 


GTTGTGCCCG 


GGGGTTTCGA 


360 




CGACGTAGTA 


GCGGCTCTCC 


ATGCCGTAGT 


AGCGCAGCCG 


CCGCAGCCGG 


TCCGCCACCC 


420 




GCTCGTCGTC 


GGTGAGCACC 


GCGCCGCCGT 


CCCCGTACGC 


GCCCAGCACC 


TTGGTCGGGT 


480 


* » 1 


AGAAGGAGAA 


CGCGGCCGCG 


TCACCGGTCG 


AGCCGGCGAG 


TCGGCCGTGC 


CGGCGCGCCC 


540 


» •» 


CGTGCGCCTG 


CGCGCAGTCC 


TCCAGGATCA 


CCAGGTTGTG 


CCGGGCGGCC 


AGATCGCGCA 


600 


■» > » - 


GCGGTGCCAT 


GTCCACGCAC 


TGCCCGTAGA 


GGTGGACCGG 


CAGCAGACAC 


CGGGTGCGTG 


660 


* * . * ' 


GCGTGAGGAC 


GGCCTCCACC 


TGGGACGTGT 


CCATCAGGTA 


GTCCTCCTCG 


CGCACGTCCA 


720 




CGAAGACGGG 


CGTGGCACCG 


GCCGAGTCGA 


TCGCGACGAC 


CGTGGGCGCG 


GCGGTGTTGG 


780 


» i ■» 


ACACGGTGAC 


GACCTCGTCG 


CCGGGCCCGA 


CACCCAAGGC 


CTGTAACCCC 


AGCTTGACGG 


840 


-» » J 
* .» .» 


CGTTGGTCCC 


GTTGTCGACG 


CCGACGGCAT 


GTCCGACGCC 


CTGGAATGAG 


GCGAACTCGG 


900 


• • • 

• • 

• • 


ACTCGAAGCC 


GCGCACGCTC 


TCACCGAGGA 


CGAGCCGGCC 


GGAGCGGAAC 


ACCGTCTCCA 


960 


• • • 

• • 


CGGCATCGTG 


GATGTCCTCG 


CGTTCCAGCT 


CGTATTCCGG 


CAGATAGTCC 


CACACGTGTA 


1020 


• • • 

• • • 


CGGTCATCGA 


GCCCCTCCGG 


GATTCTCCCT 


GCGAAAAGTC 


ACCACTCTAC 


GACAACGTTC 


1080 


• • 

• • 

• • • '■ 


ACCACTCGCT 


TTTTCCTCAA 


CGTCCGCTTG 


AGACGGCCCG 


GCCTGCTGTG 


GCCCGGGGAA 


1140 


• • • 
* * 


AGGTGCGGTC 


GTTATCATCG 


ACTCCGTCTT 


CTCATTCGGA 


GGTTGTTCAG 


GGTGAAGGGA 


1200 




ATCATTCTCG 


CCGGGGGTAC 


GGGGAGCAGG 


CTCCACCCGA 


CGACTCTCGC 


GGTGTCCAAG 


1260 



(ix) 




27 





CAGCTTCTCC 


CCGTCGGGGA 


CAAGCCGATG 


ATCTACTACC 


CGCTCTCCGT 


GCTGATGCTG 


132 0 






GCCGGCGTCA 


CGGACATCCT 


CATCATCAGC 


ACACCGCACG 


AACTCCCCCG 


AATGCGCCGT 


1380 






CTGTTCGGCG 


ACGGCGCACA 


GCTCGGACTC 


CGCCTGGCCT 


ACGCCGAGCA 


GGAGAAACCC 


1440 






AGGGGTATCG 


CCGAGGCGTT 


CCTGATCGGT 


GCCGACCACG 


TGGGAAGCGA 


TGCCGTTGCG 


1500 






CTGGCGCTGG 


GCGACAACAT 


ATTCCACGGG 


AGTTCTTTTC 


AGGGGGTGCT 


GCGCAAGGAA ' 


1560 




* 


GCCGAGGAAT 


TGGACGGGTG 


TGTCCTGTTC 


GGTTATCCGG 


TCAAGGATCC 


CCAGCGTTAT 


1620 






GGAGTCGGCG 


AGGCGAACGC 


GTCCGGGCGG 


CTCGTCTCCA 


TCGAGGAGAA 


ACCGGTAGGC 


168 0 






CCCCGCTCCA 


ACCGGGCCAT 


CACCGGACTC 


TATTTCTACG 


ACAACGAGGT 


GGTGGACATC 


1740 






GCCCGGCGGC 


TGCGCCCCTC 


CGCCCGCGGC 


GAACTCGAAA 


TCACCGACAT 


CAACCGTACC 


1800 






TACATGGAAC 


GAGGCCGGGC 


CCGGCTCGTG 


GACCTGGGCC 


GGGGATTGGC 


CTGGCTCGAC 


1860 






ACCGGCACAC 


CCGAGTCACT 


CCTGCAGGGC 


TCGCAGTACG 


TGTCCGCCCT 


GGAGGAACGC 


1920 






CAGGGCATCA 


GGATCGCCTG 


CATCGAGGAG 


GTGGCCCTCC 


GCATGGGCTT 


CATCAACGCC 


1980 






CAGGCCTGCT 


ACGAACTGGG 


CGCGCGCCTG 


TCCGGCTCCG 


GCTACGGGCA 


GTACGTGATG 


2040 






GCCATCGCGG 


AGGAGTGCAC 


GGGGCGGGTG 


TGAGCGGCCG 


TGCGGGGTGG 


GCGAACGGCC 


2100 






CGGCCTTACC 


CGGCJCCCGGG 


CAeCCGGACG 


AACAACGCCC 


GGCCGGTCAG 


CCCGTCGTCC 


2160 






AGGAACTCGG 


CCGGGCAGGG • CGeGTCCTCG ;AACGCGGCGA 


GGTACTCCTC 


CCTGGE?GAAC 


2220 







AGGGTGAGGA 


GGTiiGGATGTC-*' eGTGAAGTCG 


CGTATGGCGGff; .TGGCGTGGGG 


GACGAGGAAC 


2280 






CGCACCTCCA 


TGGGGGTGGTr GCGGGGCTGC 


CTGGTGGAGJF^GGGAGi=^GGCG"GGGGAGGGTC 


2340 






CGGCCCTCAC 


CGCGTGCCAG 


GTdCCCGGGGr ACGTAGCCCT CCAGGAAGCG 


CTCGGGGAAC 


2400 




» •» > 


CACCAGGGCT 


CGAGGAGGAG-^ 


^CACGGGGCGC-^GGCACGAGGT 'GCGCGGGGAT 


GGTGCGCACC 


2460 




» > y 


GCCGCCCGCA 


TGTCCGCGAC 


GGTCTCCAGA 


TACGGGATGG 


AGCAGAACAG 


GCAGACCACG 


2520 




» > t 
» > > 
• J 


GCGTCGAAAC 


GCCCGCTCAG 


GGCGAAGTCG 


CGCATGTCCC 


CGGGCCGCAC 


CGGCACCGCC 


2580 




« J 

•> » 


GGCAGCCGCC 


GTTCGGCCAG 


GGCCCGCATC 


TGGTCCGACA 


GCTCCAGGCC 


CTCGGTGTGC 


2640 




> ■» t 
■* J > 
* > 


GCGAACAGCC 


GGGGGAAGGC 


CTCCAGATGG 


GCGCCGGTGC 


CGCAGGCGAC 


GTCGAGCAGC 


2700 






GAACGCGCCC 


CGGGCCGAGG 


GGACCTGATC 


TGCGCGGTGA 


CCCGTTGGGC 


CTCGTCCGCC 


2760 






CAGCTCTTTC 


CCCGGCTGCG 


GTAGACCATC 


TCGTACACGT 


CCGCCAGTTC 


CCGGCCGTAC 


2820 




) 1 


ACGCGTCAGT 


CCTCGTCCAC 


CAGGGCGACC 


GCCCGGGTCC 


ACCCGGCGGC 


GGCGCCGGCG 


2880 




» » i 
» > 1 


ACCTTGACCG 


GGAAGGAGGAr GAGGGGGAAG** 


^CCGAAGGAGA^^sGCGGGAGGGG-^ 


'GTGGAGG;r-TC 


2940 






GCCAGCCGCT 


CGAa?GTGGGAti 


:GTAGTCGCGG^TCGGGGGCeA;iGCAGGa?G@GGtjfc,GGGGG^ 


3000 




• • « 

• 

• * • • * 


ACCGATCGGT 


CGGGG^G*GGS^GGGGT-^^^eGGP&CGATGAg^G/T4«GG^ 


3060 






CTGAAGGCAT 


CGGTCCCGAT 


CACCCGGACC 


CCGTGGTCGA 


GAAGCATCCG 


TAGCGCGGGC 


3120 




» ■> > 
• > ■ 


CCGTCGAGAC 


CGGCGAAGTC 


CGTGAAGTAG 


CGCGGGGTGC 


CCGCGTGCCG 


CTGGGCACCG 


3180 




1 > » 
• 1 t 


GTGTGCAGCA 


GCACGATGTC 


CCCGGGCCGC 


AACGCGCACC 


CGGTCCGGGC 


CAGTTCCTTC 


3240 






TCCAGGCGCG 


CGGCGCTCAC 


GGTGCCCGTC 


GGAGCGTCGG 


TGAGGTCCAG 


CACCACCCCG 


3300 





28 





CGCCCGT^GA 


ACCACTCCAG 


CGGCATCTGG 


TCGATGTGGC 


GGGGGACGCC 


GTCCCCGTAC 


3360 




AGCGCGCGCG 


AACCATAGTG 


CGACGGCGCG 


TCGACGTGCG 


TGCCGGTGTG 


CGTGGTCAGC 


3420 




GTGATCCTGT 


CCAGTGACAG 


GAACTCGCCG 


TCCGGCAGTT 


CGTCCGGAGA 


GAACTCGACA 


3480 




CCGAAGTGCT 


CGCGCATCTC 


CGCGCACATG 


TGTTCCGCGC 


CCTGCCGGGG 


CGTGAGGACG 


3540 




TCGTGCACCA 


CCGGGTCGGG 


CTGGTACTGT 


GAGGAATCCA 


CCGGTGACGA 


AAGGTCGATG 


3600 


• 


AGCCGCACGC 


GCACCTCCGG 


GTTCGTAGAC 


GGGCTCGGCT 


GACGCAGCGC 


GGGTACGACG 


3660 




CTGACACGCC 


CCTCTTGACG 


TGGCCTGGAA 


GCTGGTTCGA 


CGGGCGGGCA 


CCGCACGCGA 


3720 




CGGCCGGCGC 


CGCACCGGCG 


CCGTCCCGGC 


CGAGCGGGAA 


TCCAGGGAGG 


GTATAGCGGC 


3780 




GCGCCCCACG 


CTGCCGTCAT 


GGTGATGJ^AA 


CTGACGGACA 


GCGAGCTGGG 


GCGTGCGCTG 


3840 




CTCTCGCTGC 


GTGGTTACCA 


GTGGCTCCGC 


GGCATCCACC 


ACGATCCCTA 


CGCCCTGCTG 


3900 




CTGCGCGCCG 


AGAGCGACGA 


TCCGGCGCAG 


CTCGGCCGGC 


TGCTGCGTGA 


ACGCGGCCGG 


3960 




CTCCACCGCA 


gcgacaccgg" 


CACCTGGGTC~ 


"accgcggacc 


ATGCGACGG'C 


ICTCCCGGCTG^ 


40 20'" ^ — 




CTCGCCGACC 


CGCGCTTCGT 


GCTGCGCCGC 


CCGCCGGCCG 


GGCCCGCCAC 


CGGCACCGGG 


4080 




GACGTCATGC 


CGTGGGAAGA 


GGCCACGCTG 


agcgacctgc 


TGCCCCTCGA 


CGAGGCGCGC 


4140 




CTGACGACCG 


ACCGGQCACG 


GTGCCGGCGG 


CTCGGCGCGA 


CCGCCGCGCG 


GATCGCGGCG 


4200 




GACGGTCCCG 


TCGCGACGCG 


ACTCGCGGAC 


CTGGCCGGGG 


CCCGAGCCGA 


ACAGGTGCGC 


4260 




TCAACGGGCC 


ACTTCGACCT 


CAGGGCCGAC 


TACGCCCTCC 


CGTACGCGGT 


CGAGCCGGCC 


4320 




TGCGCGCTGC 


TCGGCCTGCC 


GGCCGGGCAG 


TGTTCCCTCT 


TCGGCGCCTT 


CTCCCCGGCC 


4380 




GTCCTGCTCG 


ACGCGACGGT 


CGTACCGCCC 


CGCCTTCCGG 


AGGCGCGCGC 


CCTGATCGCC 


444 0 


> 1 t > 


TCCACGGCGG 


AACTGACCGC 


CCTCTGGCCG 


CGGCTGGCCC 


CGAGCCTGTC 


GAAGACCGTC 


4500 


> > 1 


CCGGAGGACG 


AAGCGCCGGA 


CCTCTTCCTG 


CTGACGGCCG 


TGTTACTCGT 


ACCGGCCGTC 


4560 


> 1 
I J > 
» > > 


GTCCACCTGG 


TCTGCGAGGC 


GGTCGCCGCC 


CTGTCGCACG 


ACCCCGGGCA 


GGCCGGGCTG 


4620 


> .> 

■* ■> 

> > 


CTCAGGGACG 


ACCCGGTACT 


CGCCGCACCG 


GCGGTCGAGG 


AGACGCTGCG 


CCACGCACCG 


4680 


1 i 
t > » - 

3 * J 
> > 


CCCGCCCGTC 


TGTTCACCCT 


CCACGCGACC 


GGACCGGAGC 


GCGTCGCGGA 


CGTCGACCTC 


4740 


> » > 
> * > 


CCCGCGGGCG 


CCGAGGTCGC 


CGTCGTCGTG 


GCGGCGGCGC 


ACCGCGATCC 


CTCCTGGTGC 


4800 




CCGGACCCCG 


ACCGCTTCGA 


CCTCACCAGG 


AACGAGCGGC 


ATCTGGCACT 


GCCGCCGGAT 


4860 


• * > 1 

> > 


CTGCCGCTGG 


GGGCGCTCGC 


CCCGCTGCTG 


CGCGTCTGCG 


CGACCGCGGC 


CGTCGCGGCC 


4920 




CTCGCGGCCG 


GACTCCTCCC 


GCTGCGGGCC 


GTCGGCCCGC 


CCGTACGACG 


GCTGCGTGCC 


4980 


• • » 
• * 


CCGGTCACCC 


GGTCCGTGCT 


GCGCTTCCCC 


GTCGCCCCGT 


GCTGAGCAGC 


CCCTCCTCAC 


5040 


• • • 

• • 


GTCATCCCCG 


GCCCGCCTTC 


CCCCGCCCGC 


AACGGAAGGG 


ACTCTCCATG 


GACAACCGCG 


5100 


• * • 
« • • 


AGACCGTACG 


ACGGGTGAGC 


GTCTGCCGGG 


TCTGCGGCGG 


CAACGACTGG 


CAGGACGTCG 


5160 


* • 
« • 


TGGACTTCGG 


TGACGTTCCC 


CTCGCCAACG 


GCTTCCTGTC 


CCCGGCCGAC 


TCCTACGAGA 


5220 




ACGAGCGCCG 


CTACCCGCTG 


GGCGTCCTGT 


CCTGCCGCGC 


CTGCCGGCTG 


ATGAGCCTGA 


5280 




CCCACGTGGT 


CGACCCCGAG 


GTGCTGTACC 


GCGACTACGC 


CTACACCACC 


CCCGACTCCG 


5340 
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AAATGATCAC CCAGCACATG CGGCACATCA CCGCGCTGTG CCGCACCCGT TTCGAGCTTC 54 00 

CCCCGGACAG CCTCGTCGTG GAGCTGGGCA GCAATACCGG CCGTCAGCTC ATGGCCTTCC 5460 

GCGAAGCGGG GATGCGCACC CTGGGCGTGG ACCCCGCGCG GAACCTCACG GACGTCGCCC 552 0 

GGCGCT^CGG CATCGAGACC TTCCCCGACT TCTTCTCCCA CGACGTGGCC CGCACCATCC 5580 

GGCGCGACCA CGGGCAGGCG CGGCTCGTGC TGGGACGGCA ' TGTCTTeGee CACATCGACG 564 0 

ACGTGTCGGA CATCGCGGCC GGCGTACGCG AACTCCTGTC TCCCGAGGGG GTGTTGGCGA 5 700 

TCGAGGTGCC GTACGTTCTG GACCTGCTGG AGAAGGTCGC GTTCGACACC ATCTACG ACG 5760 

AGCACTTGTC GTACTTCACC ATGCGGTCCT TCGTCACCCT CTTCGCGCGC CACGGGCTGC 582 0 

GGGTGCTCGA CGTGGAGCGG TTCGGCGTGC ACGGCGGATC GGTCCTCGTC TTCGTGGGCC 58 8 0 

ACGAGGACGG CCCCTGGCCC GAACGTCCCT CCGTCCCCGA ACTGCTGCGC GTGGAACGGC 5 94 0 

AGCGGGGCCT CTACGACGAC GCCACCTACC GCACGTTCGC GCAGCGGATC GAGCGGGTGC 6000 

GCACCGAACT GCCGGAACTG CTGCGCTCCC TCGTGGCCCA GGGCAAGCGC ATCGTCGGCT 6 06 0 

ACGGTGCTCC GGCCAAGGGC AACACCATCC TCACGGTGTG CGGGCTCGGC CTGAAGGAGC 612 0 

TGGAATACTG CACCGACACC ACCGAGCTGA AGCAGGGCAG GGTGCTGCCC GGCACCCACA 618 0 

TACCGGTGCA CGCTCCCGAG CACGCCAAGG AACACATCCC CGACTACTAC CTGTTGCTCG 624 0 

CCTGGAACTA CGCCACGGAG ATCCTCG;AGA^GGAGACGGC CTTCCGGGAC~:fiA^GGCGGCC 63"00~ 

GGTTCATCGT GCCCATCCCC CGCCCGTCGA TCCTCACGTC CCCGTCAGGT TCCTGAGGCG 6360 

CCCGCCGGGC AGCAGCTGAC GCATCGCCTC GCGCAGGGCT GCACGCCAGT CGCGGGGCGG 64 2 0 

TGCGACGCCG ACCAGCCGCC AGCGGTCGTG CCCGAGCACC GTGCACGCCG GCCGGGGCGC 6480 

CGGGCCCGGC CGGTCGGCCG TCGCCACCGG GCGCACCCGT TCCGGGTCCG CGCCCGCCAG 6 54 0 

CCGGAACACC TCCCGGGCCA GCTCGTACCA GGTGGCCGCC CCGGCGTTGG TGGCGTGGAA 66 00 

GATCCCGCGC GCCCGGTCTG GCGGCGTGCG GGCCAGCGTC ACCAGCAGCC GGGCCACGTC 666 0 

ACCGGCCCAC GTCGGCTGCC CCCACTGGTC GTTGACGACG TCGACATGGC CGTCGTCCGG 6720 

GGCACGCTCC AGCATCGTGC GCACGAAGCT GCGGCCCTGC CCGCCGTAGA GCCACGCCGT 6780 

GCGCACCACG GTGCCCGTAT CCGGCAGCAG CGACAGCACG GCCCGTTCCC CGGCCAGTTT 684 0 

GCTGCGGCCG TACACCGTGC GCGGGCCCGG AGCGTCCGAC TCGCCGTAAG GGCTGCGGGT 6900 

GTCGCCCGGG AAGACGTAGT CGGTCGAGAC GTGGATCAGC CGTACGCCGT GGCGCGCACA 6 96 0 

GCGGCGGGCC AGCAGCCGGG GCCCGCCGCC GTTGACGCGC ATCGCCTCCG CCCACCGCGA 702 0 

CTCGGCGCCG TCCACGTCCG TGAAGGCGGC GCAGTTGACC ACCACGeGGG GCCGGTGCGC 7 0 80 

GGCGAACGCG GCGTCCACCG CCCGGCCGTC GGTGATGTCC AGGGCGGGCG GCCCGAGTAC 7140 

CACCGCCTCG GCGGCGGGCC GGCTCCTGCC GGTCTCCGCC AGGGCCGCGG TCAGGTGCCG 72 00 

GGCGAGCATG CCTTCTCCTC CGGTGACCAG CACGCGCATC CCGCTCACCG GACCCCGGGG 7260 

ACGACGGTGG ACGTACCGCC CGGCGCCGTG ACTCCCCGCT TGAGCGGCTC CCACCAGGAC 73 2 0 

CGGTTCTCGC GGTACCACTG GACCGTCGAG CGCAGCCCCG AGGAGAACTC CCGCGCCGGA 738 0 



30 







CGGTAGCCCA 


GTTCCTCACG 


GGCCCTGCCC 


CAGTCCAGGC 


TGTAACGCAG 


GTCGTGCCCC 


7440 






TTGCGGTCGG 


GCACGTGCCG 


GACGCTGCTC 


CAGTCCGCCC 


CGCACAGCTC 


CAGCAACATA 


7500 






CCCACCAGCT 


CCCGGTTGGA 


GAGCTCCCGG 


CCGCCGCCGA 


TGTGGTACAC 


ACCGCCGGGC 


7560 






CGGCCCGCGG 


TGCGCACCAG 


GTCCACGCCC 


CGGCAGTGGT 


CCTCCACGTG 


CAGCCACTCC 


7620 






CGCACGTTCC 


GCCCGTCCCC 


GTACAGCGGC 


ACCGGCAGCC 


CGTCCAACAA 


GTTGGTGACG 


7680 


- 




AAGCGCGGGA 


TGAGCTTCTC 


CGGGTGCTGA 


CGCGGGCCGT 


AGTTGTTGGA 


ACAGCGGGTC 


7740 






ACCCGCACGT 


CCAGGCCGTG 


CGTGCGGTGG 


CAGGCGAACG 


CCATCAGGTC 


GGCCGACGCC 


7800 






TTGGAGGCGG 


CGTACGGGGA 


GTTGGGGCTC 


AGCGGGTGCT 


CCTCCGGCCA 


GGAACCGGAC 


7860 






GCGATGGAGC 


CGTAGACCTC 


GTCCGTGGAC 


ACCAGGACGA 


AGGGCTCCAC 


GCCGTGGCGC 


7920 






AGCGCGGCGT 


CCAGCAGCCG 


CTGGGTGCCG 


ACGACGTTGG 


TCAGCACGAA 


GTCGTCGGCC 


7980 






GCGCGGATGG 


ACCGGTCGAC 


GTGCGACTCC 


GCGGCGAAGT 


GGACGACCTG 


GTCGCTGTGT 


8040 






GCCATCAGCT 


CGTCGACCAG 


CTCGGCGTCG 


AGGATGTCGC 


"CCCGCACGAA 


G1:GC AGCCGG 


~8~10^0" 






TCACCGCGTA 


CCGCGTCCAG 


GTTCGTGAGG 


TTGCCCGCGT 


ACGTCAGTTT 


GTCGAGGACG 


8160 






GTGACGCGTA 


CCGCCGGGGC 


CCCCGCTCCG 


GGGGCCCGGT 


TCTCCAGCAG 


CATGCGCACA 


8220 






TAGGCCGAGC 


CGATGAAACC 


GACCGGGCCG 


GTGACCAGGA 


TGTTCACGTC 


CGTCGTCGCG 


8280 






GAGGTGTGCG 


ACGCCATGGG 


TTCCCTCGAT. 


^.CCGTCGGGTG 


CCGTGGGGCG 


GAGTGCGCCC 


8340 






CCTCGACCCA 


GCGTCGGGGG 


CGGCCGTGGA 


GGAGCGGTTG 


AGCTTCGGCG 


CAGCGGCGGC 


8400 






TCGACCGGCG 


GCGGCCGGCG 


TCGCCGGACT 


CCAACGGTTC 


TCGACGGAAC 


GACCAACGGC 


8460 






CCTGGCGAGA 


CTGCCCGGAC 


AGCCCGGCCG 


AGAGAGGGAG 


GACCCGTTGA 


GCCGTCAGAC 


8520 




» » 


AGAGATCGTC 


CGCCGGATGG 


TGAGCGCCTT 


CAACACCGGC 


AGGACCGACG 


ACGTGGACGA 


8580 


-» 
> 


» > 1 

> •» 

> > 


GTACATCCAC 


CCCGACTACC 


TCAATCCGGC 


CACCTTGGAA 


CACGGCATCC 


ACACCGGGCC 


8640 






CAAGGCGTTC 


GCCCAGCTGG 


TCGGCTGGGT 


GCGGGCGACG 


TTCTCCGAGG 


AAGCCCGCCT 


8700 




« 9 


GGAGGAGGTG 


CGGATCGAGG 


AGCGCGGCCC 


GTGGGTCAAG 


GCCTACCTCG 


TGCTCTACGG 


8760 




> •>■ 


CCGCCACGTC 


GGCCGGCTTG 


TCGGTATGCC 


GCCCACCGAC 


CGGCGCTTCT 


CCGGTGAACA 


8820 




» > 


GGTGCACCTG 


ATGCGCATCG 


TCGACGGGAA 


GATCCGCGAC 


CACCGGGACT 


GGCCCGACTT 


8880 






CCAGGGGACG 


CTGCGCCAGC 


TCGGCGACCC 


GTGGCCCGAC 


GACGAGGGCT 


GGCGTCCGTG 


8940 




* 1 
> 1 


ACCGTCCCTG 


AAACCGCACC 


CGACGAGACA 


TCAGACCAGG 


AAGGATGGCT 


CATGCCGGAT 


9000 


* 


> > 


CCCGGCGGCC 


CGACCACGGC 


CGAGAACCTG 


TCGAAGGAGG 


CTGTCCGCTT 


CTACCGCGAG 


9060 


* 




CAGGGTTACG 


TGCACATCCC 


GCGCGTCCTG 


TCGGAGACGG 


AGGTGACCGC 


CTTCCGGGCC 


9120 


• 


• a 

• 

• • • 


GCCTGTGAGG 


AGGTCCTGGA 


GAAGGAGGGC 


CGCGAGATCT 


CCGGCATCGC 


CCTGCGGCTG 


9180 


• • 


• 


GCCGGCGCGC 


CCCTGCGGGT 


CTACAGCAGC 


GACATCCTGG 


TCAAGGAGCC 


CAAGCGCACC 


9240 


• 
• 




CTGCCCACCC 


TGGTCCACGA 


CGACGAGACG 


GGACTGCCGC 


TGAACGAGCT 


GAGTGCCACG 


9300 






CTGACGGCCT 


GGATCGCGCT 


GACGGACGTA 


CCCGTCGAAC 


GCGGCTGCAT 


GAGCTACGTG 


9360 






CCGGGCTCCC 


ATCTCAGGGC 


CCGCGAGGAC 


CGGCAGGAGC 


ACATGACCAG 


CTTCGCCGAG 


9420 
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TTCCGGGACC TCGCGGACGT GTGGCCCGAT TACCCGTGGC AGCCGCGCGT CGCCGTGCCC 
GTCCGCGCCG GAGACGTCGT GTTCCACCAT TGCCGTACCG TCCACATGGC CGAAGCCAAC 
ACCAGCGACT CGGTCCGCAT GGCGCATGGC GTCGTCTACA TGGACGCGGA CGCCACCTAC 
CGGCCGGGCG TCCAGGACGG GCACCTGTCC CGCCTGTCGC CGGGAGATCC ACTCGAAGGC 
GAGCTGTTCC CCCTGGTCAC GGCAGGGACA CGGCAGTGAG GTCCGCCGTT' CCCGGGGGTC 
GCGGGACCGC CGGGGACGGC ACCGTCAGCC GGCCAGCGCC ACGAGGTTGG CGGGGGTGTC 
GGCCGGCGGC GGCATCTCGC TCATCTCCTG CCGCACCCGC AGGGCCGCCT CCCGCAACCC 



9480 
9540 
9600 
9660 
9720 
9780 
9840 



CGCGTCGTCC AGCAGCCGTC GGCACTGCTC GGCACCCAGC GATCCCGCCT CGGCATCGAA 
CCCGATGCCC AGCCCGGTCA GCACATCGCG GTTGGTGTCC TGGTAGGAGC CGTGCGGGAT 



9900 
9960 



GACGCACTGC GGGACGCCGG CGGCCAGGGC CGTCAGCAGT GTGCCGCTGC CCCCGTGATG 
GATGATCGCG TCGCACGTCT CCAGCAGCGC GCCCAGCGGA ATCCACTCCA CCACCGGTAC 
GTTCGCGGGC AGTTCACCGA GCAGGGCCAG GTCGCCGCCG CCCAGGGTCA GCACGAACTC 
CGCGTCCACG TCCGCCACTT CGGAGAACAG CGGGGCCAGC TTGGCGATGC CGCCCGACAG 
CGCGTCGATG GAGCCCAGCG TCACCGCGAT ACGCCGCCGG CCGGCCGCGG GCGGCAGCCA 
GTCCGGCAGC. ACQGCTeeGC- CGTTGTAGGG < GAGGTAEGGe ATGGGCGAGG CACCCGGGGA 
GCGCCGGTCC-TCCGGQAGaA-GCGCCTCG^C-GCTCGGGGG37-GTCGTCGTeA~GCCG^^^^ 
ACCGGTCGGG^TCGGGGGTGAs^GGeGGTGGCG ©TeGTAGTGG- TTGGAC^TGG: GCGGGGGGAT 
GAGCGCGCCGftmGQG,eeGGeT^^CGC~TGTGleGeGGGAG(5eAGC*GGGAGG,TCTATGGGAGGGCAG 



10020 
10080 
10140 
10200 
10260 
10320 



"10380 
10440 
10500 



10560 



TTGCAGCGCT-'-GCGGCCGT(5gaGCGGGeeCGG»<GCGGTG TGT® GGAG.TGTGSA- 'GGAGGAGGTC 
GGGCCGCCAG..CTGGGGG'GGGr4ceGGAGG6feteCCeGTGGAGGiGGGACeGGGGnATAeGGGGGC 10620 
GAACATCTCG GCGAAGAAGC CCTCGCCCAG CCCCTCGGAG TGCATCGGGT CGGTGACGTC 10680 
GGTGTCGTCG GGCACGAAGA GCTTCGCGTA GTTCACGCCG GGCGACACGT CCACGGCGCA 
CAGCCCGGCC TCCGCGACGG CGCGGATGTC GCCCCCCGTG GCGTAGCGGA CCTCGTGGCC 
GAGAGCGCGC AGCGCCTGTG CCAGCGGCAC CGTCGGCAGG ATGTGGCTGA GCCCGGGTGA 
AGTGATGAAC AACGCACGCA TGATGCCCCC TGTTCGACAT GAACCTGGAA CACGCATCCT 
GACGGCGCCT TCTGTTGCTC CGGTCGACGC CCGGTCGACA GGCCCTCGTA CAGCCCGCCG 
GGGGCCGGTC CGGCCACGAG GCAGGCTCCA GCGGACGTCG ACGGCGGGGA CGCAGCGTGG 
TCGCCGGGAG«GCA^FeG7«mA%GAGmTTGGT^AASGGGASG.SeAGAGGAAAG^^^^ 
GGTCGTCACC^GGGG.TACTGGWGCGCGGGeeG GCGGSrTGGGSilGCG&TGAGGe-GCAGAeGeGA- 



10740 
10800 
10860 
10920 
10980 
11040 
11100 
11160 

CCGGTCeGGG*&T(S%S^GGGGG»?GGG-@GGlKSA<r*&At&A@GeGG@#GAe^^^^^ 11220- 
CTACGAGCGG ATGCTGGACG GTGTCGAAGC CGTCTACCTG TTCCCCGTCC CGGAGACCGC 11280 
CGCGGCGTTC GCCGGGGCCG CGCGACGGGC CGGTGTCCGG CGGATCGTGG TGCTCTCCTC 11340 
GGACTCCGTC ACCGACGGCA CCGACACCGG AGGACACCGG CGCGTGGAAC TGGCCGTGGA 114 00 
GGACACGGGG CTCGAGTGGA CCCATGTGCG CCCCGGCGAG TTCGCGCTCA ACAAGGTCAC 114 60 



32 





CCTGTGGGCG 


CCGTCGATCC 


GCGCGGAGGG 


CGTCGTCCGG 


TCCGCGTATC 


CGGACGCCCG 


11520 




GGTGGCCCCG 


GTGCACGAGG 


CCGACGTCGC 


GGCCGTCGCG 


GTGACCGCGC 


TGCTGAAGGA 


11580 




GGGGCACGCC 


GGCCGCGCCT 


ACAGCGTGAC 


CGGACCGCAG 


GCCCTCACCC 


AGCGCGAACA 


11640 




GGTCCGCGCG 


GTAGGGGAGG 


GGCTCGGCCG 


GTCCCTCGCC 


TTCGTCGAGG 


TGACCCCCGG 


11700 




GCAGGCGCGG 


GCCGACCTGA 


CCGCCCAGGG 


GCTGCCCGCG 


CCCATCGCCG 


ACTACGTCCT 


11760 


- 


CGCCTTCCAA 


GCCGGGTGGA 


CCGAGCGGCC 


CGCCCCCGCC 


CGGCCGACCG 


TGCGGGAGGT 


11820 




CACCGGCCGG 


CCCGCCCGCA 


CGCTCGCCCA 


GTGGGCCGCC 


GACCACCGAG 


CGGACTTCCG 


11880 




GTGACCGGAG 


ACCGCGTCCA 


CCGCGCCACG 


ACAGAAAGGC 


GACGCCCGTG 


CGCGTACTGC 


11940 




TGACGTCCTT 


CGCCATGGAC 


GCCCACTTCT 


GCACCGCCGT 


GCCGCTGGCG 


TGGGCACTGC 


12000 




GGTCGGCCGG 


GCACGAGGTA 


CGGGTGGCCG 


GCCAGCCCGC 


GCTCACCTCC 


ACCATCACGG 


12060 




GAGCCGGCCT 


GACCGCCGTG 


CCGGTCGGCC 


GCGACCACAC 


GCACGGCAGC 


CTCCTGGGCC 


12120 




GGGTCGGCAG 


CGACATCCTC 


GCCCTGCACG 


ACGAGGCGGA 


"CTACCTGGAG' 


GCCraTCACG" 


"12180 




ACGCCCTGGG 


CTTCGAGTTC 


CTCAAAGGGC 


ACAACACGGT 


GATGTCCGCG 


TTGTTCTACT 


12240 




CGCAGATCAA 


CAACGACTCG 


ATGGTCGACG 


ACCTGGTGGA 


CTTCGCCCGT 


CACTGGCGGC 


12300 




CCGACCTGGT 


CGTCTGGGAG 


CCGTTCACCT 


TCGCGGGCGC 


CGTGGCCGCG 


CGGGCCTCGG 


12360 




GCGCCGCCCA 


CGCCCGCCTG 


CTGTCCTXeC .GCGACCTGTT 


CCTCAGCACG 


CGCCGCCTCT 


12420 




TCCTGGAGCG 


CATGGCGCGC 


CAGGAGCCCG 


AGCATCACGA 


CGACACACTC 


GCCGAATGGC 


12480 




TCGACTGGAC 


CCTTGGCCGG 


CACGGCCACT 


CCTTCGACGA 


GGAGATCGTC 


ACGGGGCAGT 


12540 




GGTCCATCGA 


CCAGACCCCC 


GCCCCCGTGC 


GGCTCGACGC 


CGGCGGTCCC 


ACCGTGCCGA 


12600 


J > ) 
1 > ) 


TGCGGTACGT 


CCCCTACAGC 


GGACTGGTGC 


CCACAGTGGT 


GCCCGACTGG 


CTGCGCAGGC 


12660 


» ) » 


CGCCCGAGCG 


GCCACGGGTC 


CTGGTCACCC 


TCGGCATCAC 


CTCACGGCGG 


GTGAAGTCCT 


12720 




TCCTCGCCGT 


CTCCGTGGAC 


GACCTTTTCG 


AGGCCGTGGC 


CGGGCTCGGC 


GTCGAGGTGG 


12780 




TCGCCACCCT 


CGACGCCGAC 


CAGCGGGAGC 


TGCTGGGGCG 


CGTGCCGGAC 


CACTTCCGCA 


12840 


3 > 
* * »■ 

J > 


TCGTCGAGCA 


CGTGCCGCTG 


GACGCCGTTC 


TGCCGACCTG 


CTCGGCGATC 


GTCCACCACG 


12900 


.» J > 
> * 1 
• 4 1 

• 


GCGGAGCCGG 


CACCTGGTCG 


ACGGCCGCCG 


TGTACGGGGT 


GCCGCAGGTC 


TCCCTGGGCT 


12960 




CGATGTGGGA 


CCACTTCTAC 


CGGGCCCGTC 


GCCTGGAGGA 


ACTCGGGGCG 


GGGCTGCGGC 


13020 


• > 

» * t 

• 1 


TGCCCTCCGG 


CGAGCTGACT 


GCCGAGGGGC 


TGCGCACCCG 


GCTGGAGAGG 


GTGCTCGGCG 


13080 


■» » 1 
* * J 


AGCCCTCCTT 


CGGCACCGCC 


GCGCAGGCGC 


TGAGCGACAC 


CATCGCGGCG 


GT^CCCAGCC 


13140 


» > J 
■ J 


CCAGCGAGGT 


CGTGCCGGTC 


CTGGAGGAGC 


TGACCGGACG 


GCACCGTCCC 


GGCACCCGGG 


13200 


• * * 

* • 


NNNNNNNNNN 


NNNNNNHNNN 


NNNNNNNNNN 


^^S^^JNNKNNNN 


NNNNNNNNNN 


NNNNNNNNNN 


13260 


• • • 


NNNNNNNNKN 


NNNNNNNNNN 


NNNKNNNNNN 


NNNNNNNNNN 


CCGTCCGGGC 


CCCTCGCCGG 


13320 


• • 

• * 

• • 

» » » 


TGAGGGAGCC 


CGGATCACAG 


TCCGTCCGGC 


ACCACGCCCA 


GGTCCCGGAA 


CAGCGGGGAG 


13380 




AAGTTGAAGA 


CGTCCCAGTG 


CTCCACGACC 


TTGCCGGCTT 


CGGAGAAGCG 


CAGCTCCTCC 


13440 




AAGTAGGTCC 


AGCGGACCTT 


GCGGCCGGTG 


GGGGCGATGC 


CCATGTVACAC 


GCCCTGGTGC 


13500 
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GTGGCCGAGC AGQTGATCCG CAGCATCACG CGGTCGCCCT CGCCCACGAT GCTCCGCACG 13560 

TCCAGACGAA GGTCCGGGAA GGCCTCCACC GCGCTGTTCA TACGCCGTAC GACCTCCTCG 13620 

GCGCTCACCG GTTTGTCCTC GTCGTCGTAG TGGACGACGT CGGGTGCCCA GTGCGCGACC 13680 

ACCCCGGAGA CGTCCCACCG GTTCCATGCG GCCACCATCT CCAGGCAGCG TTCCTTGTTC 13740 

GCGGTCGTTG ACATGTCGAC TCCTTGAAGG CCCGGGACTA CTGGTCACGC GCCAGGCTTG 13800 

CAACCCGCCC CGGAAAAGGG GTGCACGACC GCTGGAGCCG GCACCGGAAC CTGCGCGGCG 13860 

GAGCTGAAGG GGGTTTCGAG CCGTTCACCA AGGACCTGCC GCAGGCTGTT ACGGCAGACC 13 92 0 

CTGACGCCTC GCTCCGCGCG GGACGCGCCC GCCGGGAGGA AGGACACACC ACCATGTCGG 13980 

TACGCACCGA TCAGACGGCG GCACCGGAAG ACCGAGCGGC GGCCACGGAT CCCGGGTTCG 14 04 0 



GGCACCTGTA CGCGCAGGTG CAGCAGTTCT ACGCCCGGCA GATGCAGCTC CTCGACTCCG 14100 

GCGCGGCCGA GGAGTGGGCC GCCACCTTCA CCGAGGACGG CACGTTCGCC CGGCCCTCCT 1416 0 

CGCCGGAACC GGCACGCGGC CACGCCGAAC TGGCCGCCGG CGCCCGCGCC GCCGCCGAAC 1422 0 

GCCTCGCCGC CGAGGGCCTT TCGCACCGGC ACGTCATCGG CATGACCGCG GTACGCCGGG 142 8 0 

AACCCGACGG CAGCGTGTTC GTACGCAGCT ACGCCCAGGT CTTCGCCACC CGCCGCGGGG 14 34 0 

AAGCTCCCCG GCTGGATCTG ATCTGCGTCT GCGAGGACGT GCTCGTGCGG GAGGGGCCGG 144 0 0 

~^GGCTGAAGGT GCGGGAACGG GTTGTCACGC ACGACGCGTG AGGGCGGTCG AC GCGCGGGC 144 6 0 ~ 

CGAGCCGCAC CTCTGGCACC CCCTCGGCAC GCCAGGjENSGG^^GTCGAGTCCG;^^^ 14 52 0 

GCGCACTTAG CGTGCGAGCC ATGAGTGACT. CGACAGGTCC GCGCCCGGTG- CCCGCCATGT 14 58 0 



CACCCGCCCC CAGCGGCACG CCTTCCCCCG GCCCCGCGCC CGGGAGCGAA CCCGCGCCGC 1464 0 

TCGCCGTGAT CGTCACCGGC GGCGGTTCGG GTATCGGGGG GGGCACCGCC CGGGGGTTCG 14700 

CCGCTCAGGG TGCGAAGGTG CTCGTCGTCG GCCGTACCGA GGACGCGCTC GCGCAGACCG 14 760 

CCGAGGGCTG TGCGGACATG CGTGTGCTCG TCGCCGACGT GGGCTCGCCC GAGGGGCCGG 14 82 0 

AGGCGGTCGT CAACGCCGCC CTGCGGGAGT TCGGGAGGAT CGACGTCCTG GTCAACAACG 14880 

CTGCCGTGGC GGGCATGGAG ACCCTGCAGA CCGTCGACCG GGACGCCGTG GCACGGCAGT 14 94 0 

TCGGCACCAA TCTGACGGCT CCCCTCTTCC TCGTCCAGTC CGCACTCGGC GCGCTGGAGA 15000 

AGTCGCGCGG CATCGTCGTC AACGTGGGGA CCGCCGCGAC CCTGGGCCTG CGCGCCGCCC 15 060 

CGACCGGCGC GCTGTACGGG GCGAGCAAGG TGGCCCTCGA CTACCTGACC CGGACCTGGG 1512 0 

CCGTCGAACT GGCCCCCCGG GGCATCCGTG TCGTCGGGGT GGGACCCGGa GTGATCGACA 1518 0 

CGGGCATCGG CGTGCGCATG GGCATGACCG -CGGAGGGGTA GCGGGAGTTC CTGACCGGCA 15240 

TGGGCGGCAG GGTGCCGGTG-GGCGGGGTCG GCCGTCGGGA^^^GGAGGTGGeC'*'TGGTGGATCG^ 15300 

TCCAGCTCGC CCGCCCGGAG GCCGGCTACG CGACGGGCAT GGTCGTCCCC GTCGACGGCG 153 6 0 

GGCTGTCGCT GGTCTGACCG GACAAGGAAG GAAATACCGC AGGAAGGAAG TACCGCAGCA 15420 

AGGAAATACC GCAGGAAGGA GATATCGCCG TGCAGGAAAC CGAACCCGGC GTCCCCGCGG 15480 

ACCTGCCCGC CGAGAGCGAC CCTGCCGCCC TGGAGCGCCT CGCCGCACGG TACCGGCGGG 15 54 0 
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ACGGCTACGT CCACGTCCCC GGCGTCCTCG ACGCCGGGGA GGTCGCCGAA TACCTGGCCG 15600 

AGGCCCGTCG GCTCCTCGCC CACGAGGAGT CCGTGCGCTG GGGCTCCGGC GCCGGCACCG 15660 

TCATGGACTA CGTCGCCGAC GCCCAGCTCG GCAGCGACAC GATGCGCCGC CTTGCCACCC 1572 0 

ACCCGCGCAT CGCCGCCCTC GCCGAGTACC TGGCCGGCTC GCCCCTGAGG CTGTTCAAGC 15780 

TGGAGGTGCT GCTCAAGGAG AACAAGGAGA AGGACGCCTC GGTCCCCACC GCCCCGCACC 1584 0 

ACGATGCGTT CGCCTTCCCG TTCTCCACCG CCGGCACCGC CCTGACGGCG TGGGTCGCGC 15900 

TGGTCGACGT CCCGGTGGAA CGCGGCTGCA TGACCTTCGT CCCCGGATCA CACCTGCTGC 15 960 



CGGATCCCGA TACCGGCGAC GAGCCGTGGG CCGGGGCCTT CACCCGGCCG GGAGAGATCT 16 020 

~J2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 342 amino acids 

(B) TYPE: _aminoacid 

(C) STRANDEDNESS:" single ' " 

(D) TOPOLOGY: linear 

(ii) MOLECtJLE TYPE: peptide 

(D) OTHER INFORMATION: /note= "translate of snogi" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



Met Thr Val His Val Trp Asp Tyr Leu Pro Glu Tyr Glu Leu Glu Arg 

15 10 15 

Glu Asp lie His Asp Ala Val Glu Thr Val Phe Arg Ser Gly Arg Leu 
20 2Ci 30 



Val Leu Gly Glu Ser Val Arg Gly Phe Glu Ser Glu Phe Ala Ser Phe 
35 40 45 

Gin Gly Val Gly His Ala Val Gly Val Asp Asn Gly Thr Asn Ala Val 
50 55 60 

Lys Leu Gly Leu Gin Ala Leu Gly Val Gly Pro Gly Asp Glu Val Val 
65 70 75 80 

Thr Val Ser Asn Thr Ala Ala Pro Thr Val Val Ala lie Asp Ser Ala 
85 90 95 

Gly Ala Thr Pro Val Phe Val Asp Val Arg Glu Glu Asp Tyr Leu Met 
100 105 110 

Asp Thr Ser Gin Val Glu Ala Val Leu Thr Pro Arg Thr Arg Cys Leu 
115 120 125 

Leu Pro Val His Leu Tyr Gly Gin Cys Val Asp Met Ala Pro Leu Arg 
130 135 140 

Asp Leu Ala Ala Arg His Asn Leu Val lie Leu Glu Asp Cys Ala Gin 
145 . 150 155 160 

Ala His Gly Ala Arg Arg His Gly Arg Leu Ala Gly Ser Thr Gly Asp 
165 170 175 

Ala Ala Ala Phe Ser Phe Tyr Pro Thr Lys Val Leu Gly Ala Tyr Gly 
180 185 190 
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Asp Gly Gly Ala Val Leu Thr Asp Asp Glu Arg Val Ala Asp Arg Leu 
195 200 205 

Arg Arg Leu Arg Tyr Tyr Gly Met Glu Ser Arg Tyr Tyr Val Val Glu 
210 215 220 

Thr Pro Gly His Asn Ser Arg Leu Asp Glu Val Gin Ala Glu lie Leu 

22 5 2 30^^^ 2 3§r^^ 240 

Arg Arg Lys Leu Ser Arg Leu Pro Ser Tyr lie Glu Ala Arg Arg Ala 
245 250 255 

Val Ala Arg- Arg Tyr Glu Glu Gly Leu Ala Asp, Thr Gly Leuv Leu Leu 



260-^—- 265— -~ - 270*'- 

Pro Arg Thr Ala Gin Gly Asn Glu His Val Tyr Tyr Val Tyr Val Val 
275 280 285 



Arg His Pro Arg Arg Asp Ala Val Leu Glu Ala Leu Arg Ala Ser Tyr 

290 295 300 

Asp lie Ala Leu Asn lie Ser Tyr Pro Trp Pro Val His Thr Met Thr 

305 310 315 320 

Gly Phe Ser His Leu Gly Tyr Ala Lys Gly Ser Leu Pro Val Thr Glu 

325 330 335 

Al a Leu^Al^a%«^spj|y31u Te'v-^, 

V : , 3 4 G ?^ 



(2) INFORMATflON FOR >SEQ%gD :NO t 3 :J- 

( i ) SEQUENCE t<|:Hi^ip^'@.T^^^ 

( A j^LENGT-H1-|e 2 93l»ami-noMa c-i ds^ii^ 
( B )^%TYPE^if^md>rife)^aG^d^^ 



{DmTOP©m^m^ linearv^ 

( ii ) MOl3EGUI3Ei^YPE?:%pep^t*ide^'* 

(D) OTHER^INFGRMATiOfON-:^^ /note= " translate of ^ snogJ" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Val Lys Gly lie lie Leu Ala Gly Gly Thr Gly Ser Arg Leu His Pro 
15 10 15 

Thr Thr Leu Ala Val Ser Lys Gin Leu Leu Pro Val Gly Asp Lys Pro 
20 25 30 

Met lie Tyr Tyr Pro Leu Ser Val Leu Met Leu Ala Gly Val Thr Asp 
35 40 45 

lie Leu'^^'1 1 e € le ^ S e r^ Thr^P r o>»*H is^ - Glu- L eu*«*P r ©i*v Ar gt^«Me fe*"- Ar g^ -Arg Leu 
50 55^^ 60-^ 

Phe Glyv, :AspK»Gl!*Sr^ Al^a - Gin Leu*.Gly Leuf^Argv Leu#Al'a,*i-Tyr . Al'a.^Glu Gin 
6 5 7 0 7 5'^^ 8 0 

Glu Lys Pro Arg Gly lie Ala GTii Ala Phe Leu lie Gly Ala Asp His 
85 90 95 

Val Gly Ser Asp Ala Val Ala Leu Ala Leu Gly Asp Asn lie Phe His 
100 105 110 

Gly Ser Ser Phe Gin Gly Val Leu Arg Lys Glu Ala Glu Glu Leu Asp 
115 120 125 
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Gly Cys Val Leu 
130 

Val Gly Glu Ala 
145 

Pro Val Arg Pro 



Asp Asn Glu Val 
180 



Phe Gly Tyr Pro 
135 

Asn Ala Ser Gly 
150 

Arg Ser Asn Arg 
165 

Val Asp lie Ala 



Val Lys Asp Pro 
140 

Arg Leu Val Ser 
155 

Ala lie Thr Gly 
170 

Arg Arg Leu Arg 
185 



Gin Arg Tyr Gly 



lie Glu Glu Lys 
160 

Leu Tyr Phe Tyr 
175 

Pro Ser Ala Arg 
190 





Glu 


Lf»ii 


Glu 


Tip 


Th-r 


Asp 


Tie 




Arg 


Thr 


Tyr- 


Met 


Rill Arg 


niy 






195 










200 










205 






Arg 


Ala 
210 


Arg 


Leu 


Val 


Asp 


Leu 
215 


Gly Arg 


Gly 


Phe 


Ala 
220 


Trp 


Leu Asp 


Thr 


Gly 
225 


Thr 


Pro 


Glu 


Ser 


Leu 
230 


Leu 


Gin 


Ala 


Ser 


Gin 
235 


Tyr 


Val 


Ser Ala 


Leu 
240 


Glu 


Glu 


-Arg 


Gin 


Gl-y- 
245 


I-le 


Arg- 


lie 


-Ala- 


-Cys 
250 


Ile- 


Glu 


Glu_ 


Val .Ala 
255 


Leu 


Arg 


Met 


Gly 


Phe 
260 


lie 


Asn 


Ala 


Gin 


Ala 
265 


Cys 


Tyr 


Glu 


Leu 


Gly Ala 
270 


Arg 


Leu 


Ser 


Gly 

2:75^ 


Ser 


Gly 


Tyr 


Gly 


Gin 
280 


Tyr 


Val 


Met 


Ala 


He 
285 


Ala Glu 


Glu 



Cys Thr Gly Arg Val 
290 



(2) INFORMATION FOR SEQ ID NO : 4: 



( i ) SEQUENCE— eHARACTERISTICS : = 

(A) LENGTH: 238 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 

(D) OTHER INFORMATION: /note= "translate of snogA" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Val Tyr Gly Arg Glu Leu Ala Asp Val Tyr Glu Met Val Tyr Arg Ser 
15 10 15 

Arg Gly Lys Ser Trp Ala Asp Glu Ala Glu Arg Val Thr Ala Glu He 
20 25 30 

Arg Ser Arg Arg Pro Gly Ala Arg Ser Leu Leu Asp Val Ala Cys Gly 
35 40 45 

Thr Gly Ala His Leu Glu Ala Phe Arg Gly Leu Phe Ala His Thr Glu 
50 55 60 

Gly Leu Glu Leu Ser Asp Glu Met Arg Ala Leu Ala Glu Arg Arg Leu 
65 70 75 80 

Pro Gly Val Pro Val Arg Pro Gly Asp Met Arg Asp Phe Ala Leu Ser 
85 90 95 

Gly Arg Phe Asp Ala Val Val Cys Leu Phe Cys Ser He Gly Tyr Leu 
100 105 110 



m 



37 

Glu Thr Val Ala Asp Met Arg Ala. Ala Val Arg Thr Met Ala Ala His 
115 120 125 

Leu Val Pro Gly Gly Val Leu Val Val Glu Pro Trp Trp Phe Pro Glu 
130 135 

Arg Phe Leu Glu Gly Tyr Val Ala Gly Asp Leu Ala Arg Gly Glu Glv 

145 150- 155^^ 160 

Arg Thr Val Ala Arg Val Ser His Ser Thr Arg Gin Gly Arg Arg Thr 
165 170 175 

Arg Met Glu Val Arg Phe Leu Val Gly Glu Ala Thr Gly He A rg Glu 

-180^- - 185-" " — X^G^- 

Phe Thr Glu He Asp Leu Leu Thr Leu Phe Thr Arg Glu Glu Tyr Leu 
195 200 205 



Ala Ala Phe Glu Asp Ala Gly Cys Pro Ala Glu Phe Leu Asp Asp Glv 
210 215 220 

Leu Thr Gly Arg Gly Leu Phe Val Gly Val Arg Gly Ala Glv 
225 230 235 

(2) INFORMATION FOR SEQ ID NO : 5: 

(i) SEQUENCE CHARACTERISTICS: 

{ B TYPEW amibno-a c i 
( C ) STRANBEDNESSc: ; single 



(D) - TOPOLOGY : linear*^. 



( ii ) MOIsEQipiE TYPE : peptjLdeisit 

(D) ^OTHERI^INFQRMA^BONi^ /not.e==^ " trans-late •MofxVsnoaMiiU^ 



(xi) SEQWENSE OES^IPTION: SEQ^^ID NO^:. :5: 

Met Th-ri^Ala Ala Trp^.Gly Ala'^^ Pro Pro Trp -lie Pro Ala 

1 5 10 15 

Arg Pro Gly Arg Arg Arg Cys Gly Ala Gly Arg Arg Val Arg Cys Pro 
20 25 30 

Pro Val Glu Pro Ala Ser Arg Pro Arg Gin Glu Gly Arg Val Ser Val 
35 40 45 

Val Pro Ala Leu Arg Gin Pro Ser Pro Ser Thr Asn Pro Glu Val Arcr 
50 55 60 

Val Arg Leu lie Asp Leu Ser Ser Pro Val Asp Ser Ser Gin Tyr Glu 
^5 70 75 80 

Pro Asps^Pr©j^Val^* Va-l«i^Hi^s^ Aspi^Va-M Glnt-Gl>^ Ala Glu 

85^* 90 ' 95 

His Mete^Cysv-Ala^iGlu .Mefe*. Arg; Glu His .Phe Gly. Val ^Glu* Phe^Ser-Pro 
lOm^- 105: " llOu'" 

Asp Gl\3; Leui^Pro^Asp**G15^ Glu Ph'l;*' Leu Se^r Leu Asp Arg lie Thr Leu 
115 120 125 

Thr Thr His Thr Gly Thr His Val Asp Ala Pro Ser His Tyr Gly Ser 
130 135 140 

Arg Ala Leu Tyr Gly Asp Gly Val Pro Arg His lie Asp Gin Met Pro 
1^5 150 155 160 
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Leu Glu Trp Phe Phe Gly Arg Gly Val Val Leu Asp Leu Thr Asp Ala 
165 170 175 

Pro Thr Gly Thr Val Ser Ala Ala Arg Leu Glu Lys Glu Leu Ala Arg 
180 185 190 

Thr Gly Cys Ala Leu Arg Pro Gly Asp lie Val Leu Leu His Thr Gly 
195 200 205 

Ala Gin Arg His Ala Gly Thr Pro Arg Tyr Phe Thr Asp Phe Ala Gly 
210 215 220 



Leu Asp Gly Pro Ala Val Arg Met Leu Leu Asp His Gly Val Arg Val 



2 25 



-230- 



-2 3 5- 



240 



He Gly Thr Asp Ala Phe Ser Leu Asp Ala Pro Phe Gly His He He 
245 250 255 



Asp Arg Tyr Arg Ala Thr Gly Asp Arg Ser Val Leu Trp Pro Ala His 
260 265 270 

Val_ Val Gl y Arg Glu_Arg_Glu Tyr Cys Gin He Glu Arg Leu Ala Asn 
275 ' ' 280 285 

Leu Asp Arg Leu Pro Val Ser Phe Gly Phe Arg Val Cys Cys Phe Pro 
290 295 300 

Val Lys Val Ala Gly Ala Gly Ala Gly Trp Thr Arg Ala Val Ala Leu 
305 310 315 320 



Val Asp Glu Asp 



(2) INFORMATION FOR SEQ ID NO : 6; 



<-i ) aEQITBNCE CHTVRACTERISTICS : 



(A) LENGTH: 4 08 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 



(ii) MOLECULE TYPE: peptide 

(D) OTHER INFORMATION: /note= "translate of snogN" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Val Met Lys Leu Thr Asp Ser Glu Leu Gly Arg Ala Leu Leu Ser 
1 5 .10 15 

Leu Arg Gly Tyr Gin Trp Leu Arg Gly He His His Asp Pro Tyr Ala 
20 25 30 

Leu Leu Leu Arg Ala Glu Ser Asp Asp Pro Ala Gin Leu Gly Arg Leu 
35 40 45 

Leu Arg Glu Arg Gly Arg Leu His Arg Ser Asp Thr Gly Thr Trp Val 
50 55 60 

Thr Ala Asp His Ala Thr Ala Ser Arg Leu Leu Ala Asp Pro Arg Phe 
65 70 75 80 

Val Leu Arg Arg Pro Pro Ala Gly Pro Ala Thr Gly Thr Gly Asp Val 
85 90 95 



Met Pro Trp Glu Glu Ala Thr Leu Ser Asp Leu Leu Pro Leu Asp Glu 
100 105 110 
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Ala 


Arg 


Leu 

± JLd 


Thr 


Thr 


Asp 


Arg 


Ala 
120 


Arg 


Cys 


Arg 


Arg 


Leu 
125 


Gly 


Ala 


Thr 




Ala 


Ala 

X J U 


Arg 


He 


Ala 


Ala 


Asp 
135 


Gly 


Pro 


Val 


Ala 


Thr 
140 


Arg 


Leu 


Ala 


Asp 




Leu 
145 


Ala 


Gly 


Ala 


Arg 


Ala 
150 


Glu 


Gin 


Val 


Arg 


Ser 
155 


Thr 


Gly 


His 


Phe 


Asp 
160 




Leu 


Arg 


Ala 


Asp 


Tyr 
165 


Ala 


Leu 


Pro 


Tyr 


Ala 
170 


Val 


Glu 


Pro 


Ala 


Cys 
175 


Ala - 






T^eu 


Gly 


Leu 


Pro 


Ala 


Glv Gin 


Cys 


Se?r 




Phe 


Glv Ala 




Ser : 










180 










185 










ISO 








Pro 


Ala 


Val 
195 


Leu 


Leu 


Asp 


Ala 


Thr 
200 


Val 


Val 


Pro 


Pro 


Arg 
205 


Leu 


Pro 


Glu 




Ala 


Arg 
210 


Ala 


Leu 


He 


Ala 


Ser 
215 


Thr 


Ala 


Glu 


Leu 


Thr 
220 


Ala 


Leu 


Trp 


Pro 




Arg 
225 


Leu 


Ala 


Pro 


Ser 


Leu 
230 


Ser 


Lys 


Thr 


Val 


Pro 
235 


Glu 


Asp 


Glu 


Ala 


Pro 
240 




Asp 


Leu 


Phe 


Leu 


Leu 
245 


Thr 


Ala 


Val 


Leu 


Leu 
250 


Val 


Pro 


Ala 


Val 


Val 
255 


His 




Leu 


Val 


Cys 


Glu 
260 


Ala 


Val 


Ala 


Ala 


Leu 
265 


Ser 


His 


Asp^-Pro 


Gly 
270 


Gin 


Ala 




Gly 


Leu 


Leu 
275 


Arg 


Asp 


Asp - Pro Val 
280 


Leu 


Ala 


Ala 


Pro 


Ala 
285 


Val 


Glu 


Glu . 




Thr 


Leu 
290 


Arg 


His 


Ala. 


Pro 


Pro 
295 


Ala 


Arg -Leu 


Phe '-Thr 
300 


Leu* 


His 


Ala 


Thr 




Gly 
305 


Pro 


Glu 


Arg 


Val 


Ala 
310 


Asp 


Val 


Asp 


Leu 


Pro 
315 


Ala 


Gly 


Ala 


Glu 


Val 
320 


> » 1 

1 * 1 


Ala 


Val 


Val 


Val 


Ala 
325 


Ala 


Ala 


His 


Arg 


Asp 
330 


Pro 


Ser 


Trp 


Cys 


Pro 
335 


Asp 


> 1 


Pro 


Asp 


Arg 


Phe 
340 


Asp 


Leu 


Thr 


Arg 


Asn 
345 


Glu 


Arg 


His 


Leu 


Ala 
350 


Leu 


Pro 


* > > 
<* > > 


Pro 


ASD 


Leu 
355 


Pro 


Leu 


Gly 


Ala 


Leu 
360 


Ala 


Pro 


Leu 


Leu 


Arg 
365 


Val 


Cys 


a 


* » * I * J 

J 


Thr 


Ala 
370 


Ala 


Val 


Ala 


Ala 


Leu 
375 


Ala 


Ala 


Gly 


Leu 


Leu 
380 


Pro 


Leu 


Arg 


Ala 


» i > 


Val 

.385 


Gly 


Pro 


Pro 


Val 


Arg 
390 


Arg 


Leu 


Arg 


Ala 


Pro 
395 


Val 


Thr 


Arg 


Ser 


Val 
400 




Leu 


Arg 


Phe 


Pro 


Val 
405 


Ala 


Pro 


Cys 



















(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 422 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(D) OTHER INFORMATION: /note= "translate of snoaG" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7: 





Met 
1 


Asp 


Asn 


Arg Glu 
5 


Thr 


Val 


Arg Pro 


Val 
10 


Ser 


Val 


Cys 


Arg 


Val Cys 
15 








nl \r 
\j j^y 


Asn 


•20 


Gin 


Asp 


Va 1 Va 1 
25 


Asp 


Phe 


Glv 


Asp 


Val 
30 


Pro Leu 






Ala 


Asn 


Gly 
35 


Phe Leu 


Ser 


Pro 


Ala Asp 
40 


Ser 


Tyr 


Glu 


Asn 
45 


Glu 


Arg Arg 






Tyr 


Pro 
50 


Leu 


Gly Val 


Leu 


Ser 
55 


Cys Arg 


Ala 


Cys 


Arg 
60 


Leu 


Met 


Ser Leu 






Thr 
65 


His 


Val 


Val Asp 


Pro 
70 


Glu 


Val Leu 


Tyr 


Arg 
75 


Asp 


Tyr 


Ala 


Tyr Thr 
80 






mr- 


pro 


ASp 


ser Glu 
85 


Met 


He 


Thr Gin 


His 
90 


Met 


Arg 


His 


-tte- 


Thj. Ala 

95 






Leu 


Cys 


Arg 


Thr Arg 

10 0 


Phe 


Glu 


Leu Pro 
105 


Pro 


Asp 


Ser 


Leu 


Val 
110 


Val Glu 






Leu 


Gly 


Ser 
115 


Asn Thr 


Gly 


Arg 


Gin Leu 
120 


Met 


Ala 


Phe 


Arg 
125 


Glu 


Ala Gly 






Met 


Arg 
130 


Thr 


Leu Gly 


Val 


Asp 
135 


Pro Ala 


Arg 


Asn 


Leu 
140 


Thr 


Asp 


Val Ala 






Arg 


Arg 


Asn 


Gly lie 


Glu 


Thr 


Phe Pro 


Asp 


Phe 


Phe 


Ser 


His 


Asp Val 






145 








150 








155 








160 






Ala 


Arg 


Thr 


He Arg 
165 


Arg 


Asp 


His Gly 


Gin 
170 


Ala 


Arg 


Leu 


Val 


Leu Gly 
175 








H-in 


vnl 


Phn Aln- 




Tie 


Asp Asp- 


-Vft4- 


Ser 


Asp 






Ala Gly — 












180 






185 










190 








Val 


Arg 


Glu 
195 


Leu Leu 


Ser 


Pro 


Asp Gly Val 

200 


Phe 


Ala 


lie 
205 


Glu 


Val Pro 






Tyr 


Val 
210 


Leu 


Asp Leu 


Leu 


Glu 
215 


Lys Val 


Ala 


Phe 


Asp 
220 


Thr 


He 


Tyr His 






Glu 
225 


His 


Leu 


Ser Tyr 


Phe 
230 


Thr 


Met Arg 


Ser 


Phe 
235 


Val 


Thr 


Leu 


Phe Ala 
240 




» » > 
p * J . 


Arg 


His 


Gly 


Leu Arg 
245 


Val 


Leu 


Asp Val 


Glu 
250 


Arg 


Phe 


Gly 


Val 


His Gly 
255 




• 9 > 


Gly 


Ser 


Val 


Leu Val 
260 


Phe 


Val 


Gly His 
265 


Glu 


Asp 


Gly 


Pro 


Trp 
270 


Pro Glu 




i > I 
J t t 


Arg 


Pro 


Ser 
275 


Val Pro 


Glu 


Leu 


Leu Arg 
280 


Val 


Glu 


Arg 


Gin 
285 


Arg 


Gly Leu 




• '* 

• > 


Tyr 


Asp 
290 


Asp 


Ala Thr 


Tyr 


Arg 
295 


Thr Phe 


Ala 


Gin 


Arg 
300 


He 


Glu 


Arg Val 




* •> » 
• * • • « 


Arg 
305 


Thr 


Glu 


Leu Pro 


Glu 
310 


Leu 


Leu Arg 


Ser 


Leu 
315 


Val 


Ala 


Gin 


Gly Lys 
320 




> * • 


Arg 


He 


Val 


Gly Tyr 
325 


Gly 


Ala 


Pro Ala 


Lys 
330 


Gly 


Asn 


Thr 


He 


Leu Thr 
335 






Val 


Cys 


Gly 


Leu Gly 
340 


Leu 


Lys 


Glu Leu 
345 


Glu 


Tyr 


Cys 


Thr 


Asp 
350 


Thr Thr 
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Glu Leu Lys Gin Gly Arg Val Leu Pro Gly Thr His lie Pro Val His 
355 360 365 

Ala Pro Glu His. Ala Lys Glu His lie Pro Asp Tyr Tyr Leu Leu Leu 
370 375 380 

Ala Trp Asn Tyr Ala Thr Glu lie Leu Asp Lys Glu Thr Ala Phe Arg 

385 390^ 39S^^- 400 

Asp Asn- Gly Gly Arg Phe lie Val Pro lie Pro Arg Pro Ser lie Leu ^ 
405 410 415 

Thr Ser, Pro Ser- Gly Ser 



4 2 0^-^ 



(2) INFORMATION FOR SEQ ID NO: 8: 



(3.) SE QUKNCK CHARACTERISTICS: 

(A) LENGTH: 291 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(D) OTHER INFORMATION: /note= "translate of snogC" 

(xi) SE@UENj@E»© 

Me t Leu^ Ala Ar < ? His Leu^ Thr Ala Ala Leu. Ala Glu Thr Gly Ar g Ser 



10..,. 



15 



Arg Pro?gf^Ala.^la^Gl^ii*^la^ Leu^Asp Ile 

2 0^. 25:^ ' 3 0 



Thr Asp nl y Arg A1 fi:" Vnl "A np^^^A Ai n Phe Ala Ala Hi s Arg Pro Arg 



35 



40 



45%.^ 



Val Vai'NkVal*^Asnir.Cys^^Ala Ala 'PheiH*^Th3SftrAsp4*^aO.^ Ser 
5 0 55 > 6 0<v- 

Arg Trp Ala Glu Ala Met Arg Val Asn Gly Gly Gly Pro Arg Leu Leu 
65 70 75 80 

Ala Arg Arg Cys Ala Arg His Gly Val Arg Leu lie His Val Ser Thr 
85 90 95 

Asp Tyr Val Phe Pro Gly Asp Thr Arg Ser Pro Tyr Gly Glu Ser Asp 
100 105 110 

Ala Pro Gly Pro Arg Thr Val Tyr Gly Arg Ser Lys Leu Ala Gly Glu 
115 120 125 

Arg Ala ya^l^Leu^ S e-ist^i^Leu^ Leu^-P r o^^AspiirThlG^G l^^^'E^^^ Thr 
130 13^'"' 14.0^, 

Ala Trpis?Leu*iiTy25?-^GLy^Gl.^^^^ Glynv-Arg^ Ser^Phei^VanL^Argi^^hi^ Leu- 
145 15:0%^ lS5-r 16 0 

Glu Arg Ala Pro Asp Asp -Gl^^ His Val Asp -Val Val Asn Asp Gin Trp 
165 170 175 



Gly Gin Pro Thr Trp Ala Gly Asp Val Ala Arg Leu Leu Val Thr Leu 
180 185 190 

Ala Arg Thr Pro Pro Asp Arg Ala Arg Gly lie Phe His Ala Thr Asn 
195 200 205 
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Ala Gly Ala Ala Thr Trp Tyr Glu Leu Ala Arg Glu Val Phe Arg Leu 
210 215 220 

TVla Gly Ala Asp Pro Glu Arg Val Arg Pro Val Ala Thr Ala Asp Arg 
225 230 235 240 

Pro Gly Pro Ala Pro Arg Pro Ala Cys Thr Val Leu Gly His Asp Arg 
245 250 255 

Trp Arg Leu Val Gly Val Ala Pro Pro Arg Asp Trp Arg Ala Ala Leu 
260 265 270 

Arg Glu Ala Met Arg Gin Leu Leu Pro Gly Gly Arg Leu Arg Asn Leu 



275 - - 280 - - 285 



Thr Gly Thr 
290 



(2) INFORMATION FOR SEQ ID NO : 9: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : -3 50-amino-acids , 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(D) OTHER INFORMATION: /note= "translate of snogK" 



3 J 



(xi) SEQUENCE DESCRIPTIONt.vSEQ. ID NO: 9: 

Met Ala Ser His Thr Ser Ala Thr Thr Asp Val Asn lie Leu Val Thr 
15 10 15 

Gly Ala Va3r-Qly Phe He— Gly Se^Ala Tyr Val Arg Met lieu—Leu Glu 
20 25 30 

Asn Arg Ala Pro Gly Ala Gly Ala Pro Ala Val Arg Val Thr Val Leu 
35 40 45 

Asp Lys Leu Thr Tyr Ala Gly Asn Leu Thr Asn Leu Asp Ala Val Arg 
50 55 60 

Gly Asp Arg Leu Arg Phe Val Arg Gly Asp He Leu Asp Ala Glu Leu 
65 70 75 80 

Val Asp Glu Leu Met Ala His Ser Asp Gin Val Val His Phe Ala Ala 
85 90 95 

Glu Ser His Val Asp Arg Ser He Arg Ala Ala Asp Asp Phe Val Leu 
100 105 110 

Thr Asn Val Val Gly Thr Gin Arg Leu Leu Asp Ala Ala Leu Arg His 
115 120 125 

Gly Val Glu Pro Phe Val Leu Val Ser Thr Asp Glu Val Tyr Gly Ser 
130 135 140 

He Ala Ser Gly Ser Trp Pro Glu Glu His Pro Leu Ser Pro Asn Ser 
145 150 155 160 

Pro Tyr Ala Ala Ser Lys Ala Ser Ala Asp Leu Met Ala Phe Ala Cys 
165 170 175 

His Arg Thr His Gly Leu Asp Val Arg Val Thr Arg Cys Ser Asn Asn 
180 185 190 
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Tyr Gly Pro Arg Gin His Pro Glu Lys Leu He Pro Arg Phe Val Thr 
195 200 205 

Asn Leu Leu Asp Gly Leu Pro Val Pro Leu Tyr Gly Asp Gly Ara Asn 
210 215 220 

Val Arg Glu Trp Leu His Val Glu Asp His Cys Arg Gly Val Asp Leu 

225 230- 235^ 240 

Val Arg Thr Ala Gly Arg Pro Gly Gly Val, Tyr His He Gly Glv Glv 
245 250 255 

Arg Glu Leu Ser Asn Arg Glu Leu Val Gly Met L eu Leu Glu Leu Cvs 



260- - ^ . - ^ - 265^^- - ■ 270- 

Gly Ala Asp Trp Ser Ser Val Arg His Val Pro Asp Arg Lys Gly His 
275 280 285 



Asp Leu Arg Tyr Ser Leu Asp Trp Gly Arg Ala Arg Glu Glu Leu Glv 
290 295 300 

Tyr Arg Pro Ala Arg Glu Phe Ser Ser Gly Leu Arg Ser Thr Val Gin 
305 310 315 320 

Trp Tyr Arg Glu Asn Arg Ser Trp Trp Glu Pro Leu Lys Arg Gly Val 
325 330 335 

Thr Ala . Pro^^Gly. Gly^Thr^v Ser Thr ValvVal - Pr© Gly Vali^Arg^ 
' 34G> 345 350- 



{ 2 ) INFORMATION -^FOR . SEQ .ID- NO.:: 1 d.: 

( i ) SEQTJEN©E;^iGHARAe^rrER#S,TieS 

(A) LENGT-HI^%134^^amd*no?'-aci'ds^ 

(B) TYPE^-5|^ amino -acid 



(D) TOPOLOGY^T linear 



( ii ) MOLEGUBE^^^TYPE^^-I^pepfeide^' 

(D) OTHER INFORMATION: /note= "translate of snoaL" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Val Ser Ala Phe Asn Thr Gly Arg Thr Asp Asp Val Asp Glu Tyr 
15 10 15 

lie His Pro Asp Tyr Leu Asn Pro Ala Thr Leu Glu His Gly lie His 
20 25 30 

Thr Gly Pro Lys Ala Phe Ala Gin Leu Val Gly Trp Val Arg Ala Thr 
35 40 45 

Phe Ser GLu Gl# *Al'at^ Argr-Leu- Gli^^^ GluC Vaa%^^rg^-Il%". GltiV Glu Arq Gly 
50 55 60^' 

Pro Trp VaiL^LySf^Ala -Tyr Leu^Vaa»H^^^ Arg 
65 7 0-., 75"^^^ 80 

Leu Val Gly" Met Pro Pro Thr Asp Arg Arg Phe Ser Gly Glu Gin Val 
85 90 95 

His Leu Met Arg lie Val Asp Gly Lys lie Arg Asp His Arg Asp Trp 
100 105 110 

Pro Asp Phe Gin Gly Thr Leu Arg Gin Leu Gly Asp Pro Trp Pro Asp 
115 120 125 



• 
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Asp Glu Gly Trp Arg Pro 
130 

(2) INFORMATION FOR SEQ ID NO : 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 235 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(D) OTHER INFORMATION: /note= "translate of snoK" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 





Metr 

1 


Pro 


Asp 


Pro 


Gly 
5 


Gly 


Pro 


Thr 


Thr 


Ala 
10 


Glu 


Asn 


Leu 


Ser 


Lys 
15 


Glu 






Ala 


Val 


Arg 


Phe 


Tyr 


Arg 


Glu 


Gin 


Gly Tyr 

25 


Val 


His 


He 


Pro 
30 


Arg 


Val 






Leu 


Ser 


Glu 
35 


Thr 


Glu 


Val 


Thr 


Ala 
40 


Phe 


Arg 


Ala 


Ala 


Cys 
45 


Glu 


Glu 


Val 






Leu 


Glu 
50 


Lys 


Glu 


Gly Arg 


Glu 
55 


He 


Ser 


Gly 


He 


Ala 
60 


Leu 


Arg 


Leu 


Ala 






Gly 


Ala 


Pro 


Leu 


Arg 


Val 


Tyr 


Ser 


Ser 


Asp 


He 


Leu 


Val 


Lys 


Glu 


Pro 






65 










70 










75 










80 






Lys 


Arg 


Thr 


Leu 


Pro 
85 


Thr 


Leu 


Val 


His 


Asp 
90 


Asp 


Glu 


Thr 


Gly 


Leu 
95 


Pro 






Leu^ 


Asn 


Glu 


Leu 


Ser 


Ala 


Thr 


Leu 






Tarp 




Ala 


-l^eu. 


Thr 


Asp 












100 










105 










110 










Val 


Pro 


Val 
115 


Glu 


Arg 


Gly 


Cys 


Met 
120 


Ser 


Tyr 


Val 


Pro 


Gly 
125 


Ser 


His 


Leu 




* ) 
• t ) 
» 1 > 


Arg 


Ala 
130 


Arg 


GJ.u 


Asp 


Arg 


Gin 
135 


Glu 


His 


Met 


Thr 


Ser 
140 


Phe 


Ala 


Glu 


Phe 






Arg 
145 


Asp 


Leu 


Ala 


Asp 


Val 
150 


Trp 


Pro 


Asp 


Tyr 


Pro 
155 


Trp 


Gin 


Pro 


Arg 


Val 
160 




^ * J 

■> > 

> » > 

9*1 


Ala 


Val 


Pro 


Val 


Arg 
165 


Ala 


Gly 


Asp 


Val 


Val 
170 


Phe 


His 


His 


Cys 


Arg 
175 


Thr 






Val 


His 


Met 


Ala 
180 


Glu 


Ala 


Asn 


Thr 


Ser 
185 


Asp 


Ser 


Val 


Arg 


Met 
190 


Ala 


His 




• » > 

• t 1 
« 1 1 

• I * 


Gly 


Val 


Val 
195 


Xyr 


Met 


Asp 


Ala 


Asp 
200 


Ala 


Thr 


Tyr 


Arg 


Pro 
205 


Gly 


Val 


Gin 




• * 


Asp 


Gly 
210 


His 


Leu 


Ser 


Arg 


Leu 
215 


Ser 


Pro 


Gly 


Asp 


Pro 
220 


Leu 


Glu 


Gly 


Glu 




« 

• • - 


Leu 


Phe 


Pro 


Dfeu 


Val 


Thr 


Ala 


Gly 


Thr 


Arg 


Gin 















225 230 235 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 390 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOIsECUIsE TYPE':- peptide 

(D) OTHER INFORMATION: /note= "translate of snogD" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



Met Arg Val Pro Gly Ser Cys Arg Thr Gly Gly lie Met Arg Ala Leu 
15 10 15 



Pile li e Thx Sex Pxu Gly Leu Sei. His lie Leu Pro Tlir Val Pro Leu 
20 25 30 

Ala Gin Ala Leu Arg Ala Leu Gly His Glu Val Arg Tyr Ala Thr Gly 
35 40 45 

Gly Asp lie Arg Ala Val Ala Glu Ala Gly Leu Cys Ala Val Asp Val 
50 55 60 

Ser Pro Gly Val Asn Tyr Ala Lys Leu Phe Val Pro Asp Asp Thr Asp 

65 70te 75 ' V 8Q 

Val Thr Asp, Pro Met^t His Ser Glu Gly; Leu Gly Glu Gl y Phe Phe Ala 

85^- . ^ 90 95 



Glu Mefe*^PheiMAla>. Argsft^^aa^Ser-wAla, Val%,Ala^^^ Arg 
100^ 105 4, ' 110:^ 

Thr Al-a ^ Arc^- S eg-^Trp^ Ar < 7-^r Q T . eu^Val-- Val- Hi s JThr^^^-r^^ n-v>T- r;in 



115 ^ 120 12 5 

Gly Ala^.Gl-5^ -Pro Leu«^Thr Ala, Ala Ala Leu Gin Leut^.PriD^Cys Val Glu 
13 0 135 14 0 

Leu Pro Leu Gly Pro Ala Asp Ser Glu Pro Gly Leu Gly Ala Leu lie 
145 150 155 160 

Arg Arg Ala Met Ser Lys Asp Tyr Glu Arg His Gly Val Thr Gly Glu 
165 170 175 

Pro Thr Gly Ser Val Arg Leu Thr Thr Thr Pro Pro Ser Val Glu Ala 
180 185 190 

Leu Leu Pro Glu Asp Arg Arg Ser Pro Gly Ala Trp Pro Met Arg Tyr 
195 200 205 

Val ProwisTyisi^Asn GL^..Gly Ala* Va^ii%Leu*».Pro*-^ Aspi*^Trp^LeU'*»Pr<^'"Pro Ala 
20;©^ 215 22 0--' 

Ala >G1*5^*-Arg*.*.7lrgv Arg^^Ilie A Seos^^Ile^AsptfAla ^Leu' 

225 * 230*- 2 35" 24 0. 

Ser Gl^ Gl^^ lie" Ala Lys Leu *Ala^ Prb^ Leu Phe Ser Glu^ Val' Ala Asp 
245 250 255 

Val Asp Ala Glu Phe Val Leu Thr Leu Gly Gly Gly Asp Leu Ala Leu 
260 265 270 

Leu Gly Glu Leu Pro Ala Asn Val Pro Val Val Glu Trp lie Pro Leu 
275 280 285 



\ 



46 



Gly Ala Leu Leu Glu Thr Cys Asp Ala lie lie His His Gly Gly Ser 
290 295 300 

Gly Thr Leu Leu Thr Ala Leu Ala Ala Gly Val Pro Gin Cys Val lie 
305 310 315 320 

Pro His Gly Ser Tyr Gin Asp Thr Asn Arg Asp Val Leu Thr Gly Leu 
325 330 335 

Gly He Gly Phe Asp Ala Glu Ala Gly Ser Leu Gly Ala Glu Gin Cys 
340 345 350 

Arg Arg Leu Leu Asp Asp Ala Gly Leu Arg Glu Ala Ala LeuArgVal 

3 55 3^0__ -36b^ 



Arg Gin Glu Met Ser Glu Met Pro Pro Pro Ala Glu Thr Ala Ala Lys 
370 375 380 



Leu Val Ala Leu Ala Gly 
385 390 



( 2 )__INFORMATION_FOR S__EQ ID^Ol 13^ _ 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 275 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(D) OTHER INFORMATION : -JnotG^ "translate of snoW" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

M e t Th r Va l Leu Val Thr Gly Ala Thr G l y A s n V al G l y Arg H is V al 



10 



15 



Val Thr Gly Leu Leu Ala Ala Gly Arg Arg Val Arg Ala Leu Thr Arg 
20 25 30 

Thr Pro Asp Arg Ser Gly Leu Pro Gly Gly Ala Glu He Thr Gly Gly 
35 40 45 

Asp Leu. Thr Arg Pro Glu Thr Tyr Glu Arg Met Leu Asp Gly Val Glu 
50 55 60 

Ala Val Tyr Leu Phe Pro Val Pro Glu Thr Ala Ala Ala Phe Ala Gly 
65 70 75 80 

Ala Ala Arg Arg Ala Gly Val Arg Arg lie Val Val Leu Ser Ser Asp 
85 90 95 

Ser Val Thr Asp Gly Thr Asp Thr Gly Gly His Arg Arg Val Glu Leu 
100 105 110 

Ala Val Glu Asp Thr Gly Leu Glu Trp Thr His Val Arg Pro Gly Glu 
115 120 125 

Phe Ala Leu Asn Lys Val Thr Leu Trp Ala Pro Ser lie Arg Ala Glu 
130 135 140 

Gly Val Val Arg Ser Ala Tyr Pro Asp Ala Arg Val Ala Pro Val His 
145 150 155 160 

Glu Ala Asp Val Ala Ala Val Ala Val Thr Ala Leu Leu Lys Glu Gly 
165 170 175 
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His Ala Gly Arg Ala Tyr Ser Val Thr Gly Pro Gin Ala Leu Thr Gin 
180 185 190 

Arg Glu Gin Val Arg Ala Val Gly Glu Gly Leu Gly Arg Ser Leu Ala 
195 200 205 

Phe Val Glu Val Thr Pro Gly Gin Ala Arg Ala Asp Leu Thr Ala Gin 

210^^ 215 220 " 



Gly Leu Pro Ala Pro lie Ala Asp Tyr Val Leu Ala Phe Gin Ala Gly 
225 230 235 240. 

Trp Thr Glu Arcr Pro Ala Pro Ala Arg Pro Thr Val Arg Glu Val Thr 



245 



250 



2 55 



Gly Arg Pro Ala Arg Thr Leu Ala Gin Trp Ala Ala Asp His Arg Ala 
260 265 270 



Asp Phe Arg 
275 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: over 424 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
{D),tr.TOP@Iii©GYV!illineaa^ 



J i i )_ MOLECimE jrYJ>E_:_p_epJb 



( D) - OTHERi^;INK©RI^TION : --/note= translate of ^ sn©gE «^ 



(xi ) SEQiyEN@E^^PESGR'I^PT^ON?:^vSE@^ID^^"^^ 



10 



15 



Ala Val*t&Pro^Leu»-Al-aiv^rp^Ala Leu^Arg^-=Ser 
20 25 ' 



Ala Gly His. Glu Val Arg 
30 



Val Ala Gly Gin Pro Ala Leu Thr Ser Thr 
35 40 



lie Thr Gly Ala Gly Leu 
45 



Thr Ala Val Pro Val Gly Arg Asp His Thr 
50 55 



His Gly Ser Leu Leu Gly 
60 



Arg Val Gly Ser Asp lie Leu Ala Leu His 
65 70 



Asp Glu Ala Asp Tyr Leu 
75 80 



Glu Ala Arg His Asp Ala Leu Gly Phe Glu 
85 90 



Phe Leu Lys Gly His Asn 
95 



Thr Val-''' Met^ SeB*^ Al*a^ LeUi«i*Phe^^T'yr'wSea5%Glnx 
100 105^ 



>Ile'^Asn^**Asn«?^Asp Ser Met- 
110 ■ 



Va 1 AspirAspj««»Leu*'Va%)#'As p*tfPhe-^Al ai^Ar^ i s^ 

115^'- 12 0-*^ 



^rp«fArgfi,-:^r©^Aspj»Leu Val 



Val Trp Glu Pro Phe Thr Phe Ala Gly Ala 
130 135 



Val Ala Ala Arg Ala Ser 
140 



Gly Ala Ala His Ala Arg Leu Leu Ser Phe 
145 150 



Pro Asp Leu Phe Leu Ser 
155 160 



Thr Arg Arg Leu Phe Leu Glu Arg Met Ala 
165 170 



Arg Gin Glu Pro Glu His 
175 
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His 


Asp 


Asp 


Thr 
180 


Leu 


Ala 


Glu 


Trp 


Leu 
185 


Asp 


Trp 


Thr 


Leu 


Gly 
190 


Arg 


His 




Gly 


His 


Ser 
195 


Phe 


Asp 


Glu 


Glu 


He 
200 


Val 


Thr 


Gly 


Gin 


Trp 
205 


Ser 


He 


Asp 




Gin 


Thr 
210 


Pro 


Ala 


Pro 


Val 


Arg 
215 


Leu 


Asp 


Ala 


Gly 


Gly 
220 


Pro 


Thr 


Val 


Pro 




Met 
225 


Arg 


Tyr 


Val 


Pro 


Tyr 
230 


Ser 


Gly 


Leu 


Val 


Pro 
235 


Thr 


Val 


Val 


Pro 


Asp 
240 




Trp 


Leu 


Arg 


Arg 


Pro 


Pro 


Glu 


Arq 


Pro 


Arg 


Val 


Leu 


Val 


Thr 


Leu 


Glv 












-245 










-25 0 










2 55 






He 


Thr 


Ser 


Arg 
260 


Arg 


Val 


Lys 


Ser 


Phe 
265 


Leu 


Ala 


Val 


Ser 


Val 
270 


Asp 


Asp 




Leu 


Phe 


Glu 

2 75 


Ala 


Val 


Ala 


Gly 


Leu 
280 


Gly Val 


Glu 


Val 


Val 
285 


Ala 


Thr 


Leu 




Asp 


-Ala- 


_Asp_ 


_Gln_ 


_Arg_ 


_Glu_ 


_Leu_ 


_Leu_ 


Glv Ara 


.Val_ 


_Pro_ 


-Asp 


_H1_S_ 


_Phe_ 


Arg 






290 


295 








300 








He 
305 


Val 


Glu 


His 


Val 


Pro 
310 


Leu 


Asp 


Ala 


Val 


Leu 
315 


Pro 


Thr 


Cys 


Ser 


Ala 
32 0 




He 


Val 


His 


His 


Gly 
325 


Gly Ala 


Gly 


Thr 


Trp 
330 


Ser 


Thr 


Ala 


Ala 


Val 
335 


Tyr 




Gly Val 


Pro 


Gin 
340 


Val 


Ser: 


Leu- 


Gly 


Ser 
345 


Met 


Trp Asp 


His 


Phe 
350 


Tyr 


Arg 




Ala 


Arg 


Arg 
355 


Leu 


Glu 


Glu 


Leu 


Gly Ala 
360 


Gly 


Leu 


Arg 


Leu 
365 


Pro 


Ser 


Gly 




Glu 


Leu 
370 


Thr 


Ala 


Glu 


Gly 


Leu 
375 


Arg 


Thr 


Arg 


Leu 


Glu 
380 


Arg 


Val 


Leu 


Gly 


> t > 


Glu 
385 


Pro 


Ser 


Phe 


Gly 


Thr 
390 


Ala 


Ala 


Gin 


Ala 


Leu 
395 


Ser 


Asp 


Thr 


He 


Ala 
400 


J .> > 
y it 


Ala 


Glu 


Pro 


Ser 


Pro 
405 


Ser 


Glu 


Val 


Val 


Pro 
410 


Val 


Leu 


Glu 


Glu 


Leu 
415 


Thr 


Gly Arg 


His 


Arg 
420 


Pro 


Gly 


Thr 


Arg 



















(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 139 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECXJLE TYPE: peptide 

(D) OTHER INFORMATION: /note= "translate of snoL" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Ser Thr Thr Ala Asn Lys Glu Arg Cys Leu Glu Met Val Ala Ala 
15 10 15 

Trp Asn Arg Trp Asp Val Ser Gly Val Val Ala His Trp Ala Pro Asp 
20 25 30 
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Val Val His Tyr Asp Asp Glu Asp Lys Pro Val Ser Ala Glu Glu Val 
35 40 45 

Val Arg Arg Met Asn Ser Ala Val Glu Ala Phe Pro Asp Leu Arg Leu 
50 55 60 

Asp Val Arg Ser lie Val Gly Glu Gly Asp Arg Val Met Leu Arg lie 
65 70 75^ 80 

Thr Cys Ser Ala Thr His Gin Gly Val Phe Met Gly lie Ala Pro Thr 
85 90 95 

Gly Arg Lys Val Arg .Trp Thr Tyr Leu Glu Glu Leu Arg Phe iSer Glu 



100"- - 105'" - 110- 

Ala Gly Lys Val Val Glu His Trp Asp Val Phe Asn Phe Ser Pro Leu 
115 120 125 



Phe Arg Asp Leu Gly Val Val Pro Asp Gly Leu 
130 135 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 155 amino acids 

(B) TYPE: amino acid 

(C) STRANBEDNESS- single 

(D) TOPOLOGY^ linear 

(ii ) MOLECULE TYPE: peptide 



(D) OTHER INFORMATION: :-/note=-^J' translate of -snoOn 
(xi) SEQUENCE DESCRIPTION: SEQ IDWO^:4 16 : 

M e t S e r Val Arg Thr Asp Gin Thr Ala; Ala Pro Glu Asp Arg Ala Ale 



10 15 

Ala Thr Asp Pro Gly Phe Gly His Leu- Tyr - Ala Gin Val Gin Gin Phe 
20 25 30 

Tyr Ala Arg Gin Met Gin Leu Leu Asp Ser Gly Ala Ala Glu Glu Trp 
35 40 45 

Ala Ala Thr Phe Thr Glu Asp Gly Thr Phe Ala Arg Pro Ser Ser Pro 
50 55 60 

Glu Pro Ala Arg Gly His Ala Glu Leu Ala Ala Gly Ala Arg Ala Ala 
65 70 75 80 

Ala Glu Arg Leu Ala Ala Glu Gly Leu Ser His Arg His Val lie Gly 
85 90 95 

Met Thr Ala Val Arg Arg Glu Pro Asp.^^Gly Ser^Val Phe Val. ^ Arg Ser 
100 105 110 

Tyr Ala Gin Val Phe Ala Thr Arg* Arg Gly Glu Ala- Pro Arg Leu-.«is 
115 120 125 

Leu lie Cys Val Cys Glu Asp Val Leu Val Arg Glu Gly Pro Gly Leu 
130 135 140 

Lys Val Arg Glu Arg Val Val Thr His Asp Ala 
145 150 155 
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(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 281 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(D) OTHER INFORMATION: /note= "translate of snoaF" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 









Val Arg 
1 


Ala 


Met 


Thr 
5 


Asp 


Ser 


Thr 


Gly 


Pro 
10 


Arg 


Pro 


Val 


Pro 


Ala 
15 


Met 








yer Fro 


Ala 


Pro 
20 




Pro 


Thr 


Pro 


Set: 
25 


Pro 


Gly 


Pro 


Ala 


pro 
30 


Gly 


Ser 







- 


Glu Pro 


Ala 

_35 _ 


Pro 


Leu 


Ala 


Val 


He 

j40 _ 


Val 


Thr 


Gly 


Gly Gly 

45 _ 


Ser 


Gly 


He 








Gly Arg 
50 


Ala 


Thr 


Ala 


Arg 


Ala 
55 


Phe 


Ala 


Ala 


Gin 


Gly Ala 
60 


Lys 


Val 


Leu 








Val Val 
65 


Gly 


Arg 


Thr 


Glu 
70 


Asp 


Ala 


Leu 


Ala 


Gin 
75 


Thr 


Ala 


Glu 


Gly 


Cys 
80 








Ala Asp 


Met 


Arq 


Val 


Leu 


Val 


Ala 


Asp 


Val 


Ala 


Ser 


Pro 


Asp 


Gly 


Pro 














85 










90 










95 










Gin Ala 


Val 


Val 
100 


Asn 


Ala 


Ala 


Leu 


Arg 
105 


Glu 


Phe 


Gly Arg 


He 
110 


Asp 


Val 








Ln,ii Va^ 


Asn 


J\an_ 


_ALa_ 


_iVla^ 


Val 


,Ala- 


Gly 


Met 


Glu 


Thr 


Leu 


Gin 


Thr 


Val 










115 










120 










125 














Asp Arg 
130 


Asp 


Ala 


Val 


Ala 


Arg 
135 


Gin 


Phe 


Gly 


Thr 


Asn 
140 


Leu 


Thr 


Ala 


Pro 


> 

» 


J 




Leu Phe 
145 


Leu 


Val 


Gin 


Ser 
150 


Ala 


Leu 


Gly 


Ala 


Leu 
155 


Glu 


Lys 


Ser 


Arg 


Gly 
160 








lie Val 


Val 


Asn 


Val 
165 


Gly 


Thr 


Ala 


Ala 


Thr 
170 


Leu 


Gly 


Leu 


Arg 


Ala 
175 


Ala 


• * 






Pro Thr 


Gly 


Ala 
180 


Leu 


Tyr 


Gly 


Ala 


Ser 
185 


Lys 


Val 


Ala 


Leu 


Asp 
190 


Tyr 


Leu 




J 




Thr Arg 


Thr 
195 


Trp 


Ala 


Val 


Glu 


Leu 
200 


Ala 


Pro 


Arg 


Gly 


He 
205 


Arg 


Val 


Val 


> * 
• 


I 
i 

9 




Gly Val 
210 


Ala 


Pro 


Gly 


Val 


He 
215 


Asp 


Thr 


Gly 


He 


Gly 
220 


Val 


Arg 


Met 


Gly 


• 

• 
• 


9 
* 

• 4 


• 
• 


Met Thr 
225 


Pro 


Glu 


Gly 


Tyr 
230 


Arg 


Glu 


Phe 


Leu 


Thr 
235 


Gly 


Met 


Gly 


Gly 


Arg 
240 


■ t 


« i 
• 

* 


• 

1 • - 
• 


Val Pro 


Val 


Gly 


Arg 
245 


Val 


Gly 


Arg 


Pro 


Glu 
250 


Asp 


Val 


Ala 


Trp 


Trp 
255 


He 


> 
> 


> 


1 


Val Gin 
Pro Val 


Leu 

Asp 
275 


Ala 
260 

Gly 


Arg 
Gly 


Pro 
Leu 


Glu 
Ser 


Ala 

Leu 
280 


Gly 
265 

Val 


Tyr 


Ala 


Thr 


Gly 


Met 
270 


Val 


Val 
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(2) INFORMATION FOR SEQ ID NO : 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 190 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(D) OTHER INFORMATION: /note= "translate of snoN" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 18: - 



Val Gin Glu Thr Glu Pro Gly Val Pro Ala Asp Leu Pro Ala Glu Ser 
15 10 15 



Asp Pro Ala Ala Leu Glu Arg Leu Ala Ala Arg Tyr Arg Arg Asp Gly 
20 25 30 

Tyr Val His Val Pro Gly Val Leu Asp Ala Gly Glu Val Ala Glu Tyr 
35 40 45 

Leu Ala Glu Ala Arg Arg Leu Leu Ala His Glu Glu Ser Val Arg Trp 
50 55 60 

Gly Ser Gly Ala Gly Thr Val Met. Asp«*.Tyr Val. Ala^Asp<.Ala..Gln Leu 
65 7 0-^ 7 5^^ 8 0 

_Gly_Ser^Asp^ThruMe.t:^Arg^^^ 

BSn . . 90^ - 95 

Leu Alai^lite^f?C.eu*.Ala^^ Glu 
100M> i05^^ " 110 ' 

Val Leu Le^,.,Lys -QIU" Asn:;L Asp Ala Ger Val_Pro Thi" Al^ 

115 - 120^:^ 125^- 

Pro HisxHisuAsp^Ala .Ph^WAla;-Ph'e%Prb^^ Gly Thr- Ala 

13 0 - 13 5 14 0 

Leu Thr Ala Trp Val Ala Leu Val Asp Val Pro Val Glu Arg Gly Cys 
145 150 155 160 

Met Thr Phe Val Pro Gly Ser His Leu Leu Pro Asp Pro Asp Thr Gly 
165 170 175 

Asp Glu Pro Trp Ala Gly Ala Phe Thr Arg Pro Gly Glu lie 
180 185 190 
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Claims 



1. Isolated and purified DNA fragment, which is the gene cluster for the anthracy- 
cline biosynthetic pathway of the bacterium Streptomyces nogalater, being included 

5 in a lOkb and a 7kb flanked Bglll fragments of 5. nogalater genome. 

— 2. Th e DNA fragment a ccord ing to claim 1, comprising the nucleotide sequence — — 

given in SEQ ID NO:l, or a sequence showing at least 80% homology to said 

sequence. 

10 

— 3. A-recombinant- DNA, Jwhich comprise^the DNA fragme^^ 

2, cloned in a plasmid replicating in Streptomyces, 

4. The recombinant DNA according to claim 3, which is the plasmid pSYlSc, 

15 comprising_a_1.4Jcb_5amHI-5acLfragmentjfrom_the_plasmid_pS^ 

Mlul-Kpnl fragment from the plasmid pSY43. 



5. "t*t a s mid pSY42, depo s i t ed in S, Uvidan s s tr ain TlC24/p S Y42 with the de p ositio n 
number DSM 12451. 

20 

6. Plasmid pSY43, deposited in S. lividans strain TK24/pSY43 with the deposition 
number DSM 12452. 

7. A process for the production of hybrid compounds, comprising transferring the 
25 DNA fragment according to claim 1 or 2 into a Streptomyces host, cultivating the 

recombinant strain obtained, and isolating the compounds produced. 



8. The process according to claim 7, wherein the Streptomyces host is a Streptomyces 
galilaeus host. 



30 
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9. The process according to claim 8, wherein the Streptomyces galilaeus host is 
selected from the strains H026, H039, H063 and H075, which are mutant strains of 
S. galilaeus ATCC 31615. 

10. The process according to claim 8, wherein an anthracycline is produced, which 
has the following formula I 




11. The-proeess according to claim^^S, wherein an anthracydinone^is produced, which 
has the#loiiowing^tOrmtJl a' II — 




12. A process for the production of hybrid compounds, comprising transferring at 
least one of the genes selected from the group consisting of snogi, snogA, snoaM, 
snogN, snoaG, snogC, snogK, snoaU snoK, snogD, snoW, snogB, snoU snoO and 
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5rtoaF into a Streptomyces host, said genes being derived from the DNA fragment of 
claim 1 or 2, cultivating the recombinant strain obtained, and isolating the 
compounds produced. 

5 13. The process according to claim 12, wherein the gene shobL, encoding NAME 
cyclase is transferred into a Streptomyces host. 

14. The process according to claim 12, wherein at least one of the genes ^wogD and 

5«ogE encoding glycosyl transferases is transferred into a Streptomyces host. 

10 

15. The process according to claim 12, wherein at least one of the genes snogi, 
5nogN, snogCy snogK and snogA affecting the formation of nogalamine and nogalose 
is transferred into a Streptomyces host. 



Abstract 
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The present invention relates to the gene cluster for nogalamycin biosynthesis derived 
from Streptomyces nogalatery and the use of the jgene;s^ therein to obtain novel hybrid 
antibiotics for drug screening. 
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